AI: Reinforcement Learning

The idea behind reinforcement learning is that of a learning system that wants something and adapts its behavior in order to maximize a special signal from its environment.

Reinforcement learning problems involve learning what to do|how to map situations to actions|so as to maximize a numerical reward signal. In an essential way they are closed-loop problems because the learning system’s actions influence its later inputs. Moreover, the learner is not told which actions to take, as in many forms of machine learning, but instead must discover which actions yield the most reward by trying them out.

Reinforcement learning is different from supervised learning. Supervised learning is learning from a training set of labeled examples provided by a knowledgable external supervisor. The object of this kind of learning is for the system to extrapolate, or generalize, its responses so that it acts correctly in situations not present in the training set. This is an important kind of learning, but alone it is not adequate for learning from interaction. In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act.

Fonte: Reinforcement Learning: An Introduction – Richard S. Sutton and Andrew G. Barto


Markov Decision Process (MDP)

A Markov Decision Process (MDP) is a tuple: