CBL - Campus del Baix Llobregat

Projecte llegit

Títol: A Reinforcement Learning Approach for Next Generation Networks

Director/a: GARCÍA VILLEGAS, EDUARD

Departament: ENTEL

Títol: A Reinforcement Learning Approach for Next Generation Networks

Data inici oferta: 09-09-2020 Data finalització oferta: 09-09-2020

Estudis d'assignació del projecte:

MU MASTEAM 2015

Tipus: Individual

Lloc de realització: Fora UPC

Supervisor/a extern: Engin Zeydan
Institució/Empresa: CTTC-CENTRE TECNOLÒGIC DE TELECOMUNICACIONS DE CATALUNYA
Titulació del Director/a: Researcher

Paraules clau:
artificial intelligence, 5G, Cloudification, Softwarization, Virtualization, Virtualized Network Functions, Edge Computing, Reinforcement Learning, Markov Decision Process

Descripció del contingut i pla d'activitats:
The objective of this thesis proposal is to build a reinforcement learning (RL) application using Spark/RLLib libraries using networking data for next generation networks. These next generation networks are SDN/NFV based working on top of the 5G-TRANSFORMER/5Growth framework, which is currently being evolved to include AI-based decision-making. Some potential RL algorithms to be investigated within next generation networks are (i) High throughput architectures ones (Distributed prioritized experience replay (Ape-X), Importance Weighted Actor-Learner Architecture (IMPALA)) (ii) Gradient Based ones (Advantage Actor-Critic (A2C, A3C), Deep deterministic policy gradients (DDPG, TD3). Deep Q Networks (DQN, Rainbow). Policy Gradients, Proximal Policy Optimization(PPO)) and (iii) Derivative-free one (Augmented Random Search (ARS), Evolution Strategies) As environment it is expected to use/extend DeepMind or OpenAI enterprise applications (e.g. gym environment) for the requirements of the thesis. The outcome of the thesis will be to demonstrate the applicability or benefits of applying RL algorithm in the networking domain via a demo and publish a paper presenting the results (e.g. through algorithm comparisons in networking domain inside the developed demo setup) This master thesis will be related with one of the use cases we are considering (e.g., an extended version of the scaling of services based on network metrics that you saw for the vehicular use case in 5G-Transformer).

Overview (resum en anglès):
The fifth generation (5G) mobile networks are enabling operators and stakeholders to enhance and innovate new services in response to an increasing market demand. 5G architecture provides scalability and flexibility for adapting its infrastructure to a customizable communication system by means of Cloudification. Softwarization and virtualization are key terms for upcoming industries that will require ultra-low latency, only possible if the infrastructure equipment that traditionally was centralized in the communication network core is physically moved closer to the user, at the network edge. The main objective of this master thesis was to implement a Reinforcement Learning algorithm (Q-Learning Temporal Difference) aimed at next generation networks to optimally allocate Virtualized Network Functions (VNF) to 5G network Edge Computing (EC) centers. In order to evaluate the algorithm performance and compare it, two more algorithms have been developed to achieve a solution under the same network circumstances. The first one, Best Fit, was inspired by a classical network load balancing algorithm (Weighted Round Robin), whereas the second, MDP, was approached through dynamic programming (Policy Iteration), having posed the network dynamics as a finite Markov Decision Process. The several tests that have been carried out indicate that Q-Learning performs better than the Best Fit and almost as close as the MDP. It shows that the Q-Learning algorithm is able to allocate optimally the incoming VNF demands when EC centers' available resources are somehow restricted.