Cita:
Bakhshi, B., Mangues-Bafalluy, J. & Baranda, J. (7-11 Dec. 2021). R-Learning-based admission control for service federation in multi-domain 5G networks [proceedings]. 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain.
ISBN:
978-1-7281-8104-2 (Electronic) 978-1-7281-8105-9 (Print on Demand(PoD))
Patrocinador:
European Commission Ministerio de Economía y Competitividad (España)
Agradecimientos:
This work has been partially funded by the MINECO grant TEC2017-88373-R (5G-REFINE), the EC H2020 5Growth Project (grant no. 856709), and Generalitat de Catalunya grant 2017 SGR 1195.
Proyecto:
Gobierno de España. TEC2017-88373-R info:eu-repo/grantAgreement/EC/856709
Palabras clave:
Multi-domain 5G/B5G networks
,
Admission control
,
Service federation
,
MDP
,
Q-Learning
,
R-Learning
Network service federation in 5G/B5G networks enables service providers to extend service offering by collaborating with peering providers. Realizing this vision requires interoperability among providers towards end-to-end service orchestration across multipleNetwork service federation in 5G/B5G networks enables service providers to extend service offering by collaborating with peering providers. Realizing this vision requires interoperability among providers towards end-to-end service orchestration across multiple administrative domains. Smart admission control is fundamental to make such extended offering profitable. Without prior knowledge of service requests, the admission controller (AC) either determines the domain to deploy each demand or rejects it to maximize the long-term average profit. In this paper, we first obtain the optimal AC policy by formulating the problem as a Markov decision process, which is solved through the policy iteration method. This provides the theoretical performance bound under the assumption of known arrival and departure rates of demands. Then, for practical solutions to be deployed in real systems, where the rates are not known, we apply the Q-Learning and R-Learning algorithms to approximate the optimal policy. The extensive simulation results show that learning approaches outperform the greedy policy and are capable of getting close to optimal performance. More specifically, R-learning always outperformed the rest of practical solutions and achieved an optimality gap of 3-5% independent of the system configuration, while Q-Learning showed lower performance and depended on discount factor tuning.[+][-]