CBL - Campus del Baix Llobregat

Projecte llegit

Títol: A Reinforcement Learning Approach for Next Generation Networks


Director/a: GARCÍA VILLEGAS, EDUARD

Departament: ENTEL

Títol: A Reinforcement Learning Approach for Next Generation Networks

Data inici oferta: 09-09-2020     Data finalització oferta: 09-09-2020



Estudis d'assignació del projecte:
    MU MASTEAM 2015
Tipus: Individual
 
Lloc de realització: Fora UPC    
 
        Supervisor/a extern: Engin Zeydan
        Institució/Empresa: CTTC-CENTRE TECNOLÒGIC DE TELECOMUNICACIONS DE CATALUNYA
        Titulació del Director/a: Researcher
 
Paraules clau:
artificial intelligence, 5G, Cloudification, Softwarization, Virtualization, Virtualized Network Functions, Edge Computing, Reinforcement Learning, Markov Decision Process
 
Descripció del contingut i pla d'activitats:
The objective of this thesis proposal is to build a
reinforcement learning (RL) application using Spark/RLLib
libraries using networking data for next generation networks.
These next generation networks are SDN/NFV based working on top
of the 5G-TRANSFORMER/5Growth framework, which is currently being
evolved to include AI-based decision-making.

Some potential RL algorithms to be investigated within next
generation networks are (i) High throughput architectures ones
(Distributed prioritized experience replay (Ape-X), Importance
Weighted Actor-Learner Architecture (IMPALA)) (ii) Gradient
Based ones (Advantage Actor-Critic (A2C, A3C), Deep
deterministic policy gradients (DDPG, TD3). Deep Q Networks (DQN,
Rainbow).
Policy Gradients, Proximal Policy Optimization(PPO)) and (iii)
Derivative-free one (Augmented Random Search (ARS), Evolution
Strategies)

As environment it is expected to use/extend DeepMind or OpenAI
enterprise applications (e.g. gym environment) for the
requirements of the thesis. The outcome of the thesis will be
to demonstrate the applicability or benefits of applying RL
algorithm in the networking domain via a demo and publish a
paper presenting the results (e.g. through algorithm comparisons
in networking domain inside the developed demo setup)

This master thesis will be related with one of the use cases we
are considering (e.g., an extended version of the scaling of
services based on network metrics that you saw for the
vehicular use case in 5G-Transformer).
 
Overview (resum en anglès):
The fifth generation (5G) mobile networks are enabling operators and stakeholders to enhance and innovate new services in response to an increasing market demand. 5G architecture provides scalability and flexibility for adapting its infrastructure to a customizable communication system by means of Cloudification.

Softwarization and virtualization are key terms for upcoming industries that will require ultra-low latency, only possible if the infrastructure equipment that traditionally was centralized in the communication network core is physically moved closer to the user, at the network edge.

The main objective of this master thesis was to implement a Reinforcement Learning algorithm (Q-Learning Temporal Difference) aimed at next generation networks to optimally allocate Virtualized Network Functions (VNF) to 5G network Edge Computing (EC) centers.

In order to evaluate the algorithm performance and compare it, two more algorithms have been developed to achieve a solution under the same network circumstances. The first one, Best Fit, was inspired by a classical network load balancing algorithm (Weighted Round Robin), whereas the second, MDP, was approached through dynamic programming (Policy Iteration), having posed the network dynamics as a finite Markov Decision Process.

The several tests that have been carried out indicate that Q-Learning performs better than the Best Fit and almost as close as the MDP. It shows that the Q-Learning algorithm is able to allocate optimally the incoming VNF demands when EC centers' available resources are somehow restricted.



© CBLTIC Campus del Baix Llobregat - UPC