AI: Reinforcement Learning
The idea behind reinforcement learning is that of a learning system that wants something and adapts its behavior in order to maximize a special signal from its environment.
Reinforcement learning problems involve learning what to do|how to map situations to actions|so as to maximize a numerical reward signal. In an essential way they are closed-loop problems because the learning system’s actions influence its later inputs. Moreover, the learner is not told which actions to take, as in many forms of machine learning, but instead must discover which actions yield the most reward by trying them out.
Reinforcement learning is different from supervised learning. Supervised learning is learning from a training set of labeled examples provided by a knowledgable external supervisor. The object of this kind of learning is for the system to extrapolate, or generalize, its responses so that it acts correctly in situations not present in the training set. This is an important kind of learning, but alone it is not adequate for learning from interaction. In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act.
Fonte: Reinforcement Learning: An Introduction – Richard S. Sutton and Andrew G. Barto
- A brief introduction to reinforcement learning
- Introduction to Deep Reinforcement Learning
- Deep Reinforcement Learning – Google DeepMind – David Silver, 2016
“At DeepMind we have pioneered the combination of these approaches – reinforcement learning and deep learning – to create the first artificial agents to achieve human-level performance across many challenging domains”.
The key idea was to use deep neural networks to represent the Q-network, and to train this Q-network to predict total reward. - Reinforcement Learning: Qwik Start
- Reinforcement Learning Tutorial Part 2: Cloud Q-learning
- Online Reinforcement Learning for Self-adaptive Information Systems
- DRL-Cloud: Deep Reinforcement Learning-Based Resource Provisioning and Task Scheduling for Cloud Service Providers
(PDF) - AI-based algorithms and experimental evaluation for beyond 5G –
MSc thesis PoliTO - Open-AI gym for SD-WAN Link Selection
- Gymnasium is a project that provides an API for all single agent reinforcement learning environments that include implementations of common environments: cartpole, pendulum, mountain-car, mujoco, atari, and more.
- Deep Q-learning (DQN) Tutorial with CartPole-v0
- Deep reinforcement learning on GCP: using hyperparameter tuning and Cloud ML Engine to best OpenAI Gym games
- Beyond the Basics: Unveiling Deep Q Networks (DQN) in Reinforcement Learning
Markov Decision Process (MDP)
A Markov Decision Process (MDP) is a tuple: