Rights:
Atribución-NoComercial-SinDerivadas 3.0 España
Abstract:
Reinforcement Learning (RL), in which an agent is trained to make the most
favourable decisions in the long run, is an established technique in artificial intelligence. Its
popularity has increased in the recent past, largely due to the development of deep nReinforcement Learning (RL), in which an agent is trained to make the most
favourable decisions in the long run, is an established technique in artificial intelligence. Its
popularity has increased in the recent past, largely due to the development of deep neural
networks spawning deep reinforcement learning algorithms such as Deep Q-Learning. The
latter have been used to solve previously insurmountable problems, such as playing the
famed game of “Go” that previous algorithms could not. Many such problems suffer the
curse of dimensionality, in which the sheer number of possible states is so overwhelming
that it is impractical to explore every possible option.
While these recent techniques have been successful, they may not be strictly necessary
or practical for some applications such as cloud provisioning. In these situations, the
action space is not as vast and workload data required to train such systems is not
as widely shared, as it is considered commercialy sensitive by the Application Service
Provider (ASP). Given that provisioning decisions evolve over time in sympathy to
incident workloads, they fit into the sequential decision process problem that legacy RL
was designed to solve. However because of the high correlation of time series data, states
are not independent of each other and the legacy Markov Decision Processes (MDPs)
have to be cleverly adapted to create robust provisioning algorithms.
As the first contribution of this thesis, we exploit the knowledge of both the application
and configuration to create an adaptive provisioning system leveraging stationary Markov
distributions. We then develop algorithms that, with neither application nor configuration
knowledge, solve the underlying Markov Decision Process (MDP) to create provisioning
systems. Our Q-Learning algorithms factor in the correlation between states and the
consequent transitions between them to create provisioning systems that do not only
adapt to workloads, but can also exploit similarities between them, thereby reducing
the retraining overhead. Our algorithms also exhibit convergence in fewer learning steps
given that we restructure the state and action spaces to avoid the curse of dimensionality
without the need for the function approximation approach taken by deep Q-Learning
systems.
A crucial use-case of future networks will be the support of low-latency applications
involving highly mobile users. With these in mind, the European Telecommunications Standards Institute (ETSI) has proposed the Multi-access Edge Computing (MEC)
architecture, in which computing capabilities can be located close to the network edge,
where the data is generated. Provisioning for such applications therefore entails migrating
them to the most suitable location on the network edge as the users move. In this thesis,
we also tackle this type of provisioning by considering vehicle platooning or Cooperative
Adaptive Cruise Control (CACC) on the edge. We show that our Q-Learning algorithm
can be adapted to minimize the number of migrations required to effectively run such
an application on MEC hosts, which may also be subject to traffic from other competing
applications.[+][-]