1. Introduction

1.1. What is Reinforcement Learning (RL)?

  • Reinforcement learning is what to do to maximize a reward.

  • We can give a more “formal” definition.

Definition

Reinforcement Learning is calculating a function that maps situations to actions.

  • We said that we want to maximize a reward, but what is a reward?

Activity

  • Try to explain what a reward is.

  • To maximize a reward the learner can do different actions.

  • If the learner was passive, it could not maximize anything.

  • Usually, the learner start with no prior knowledge about what action it should do.

Activity

  • What would you do to maximize a reward if you had no idea which action you should do?

1.1.1. Key concepts

  • Reinforcement learnign has two main characteristics:

    • It use a trial-and-error type of search.

    • It has delayed rewards.

Warning

Reinforcement learning is a name that regroups different concepts:

  • It’s a type of problem.

  • It’s also a class of solution methods.

  • And it’s the field that study the two previous points as well.

You need to understand the distinction.

1.2. What is the Reinforcement learning problem? (Simplified)

  • The reinforcement learning problem is an idea coming from dynamical system theory.

  • And more specificaly from the Markov Decision Processes.

  • The basic ideas are:

    • A learning agent must sense the state of the environment.

    • The agent must be able to take actions that affect the state.

    • It must have a goal or goals relating to the state of the environment.

../../_images/rl.drawio.png

1.3. What is a Reinforcement learning method?

  • It is any methods that is well suited for solving RL problems.

Activity

  • Discuss why chess can be considered a reinforcement learning problem.

  • Try to define for chess the states, actions, and goal.

  • Why the goal must be related to the state?

1.4. What Reinforcement learning is not?

  • Supervised learning

    • Supervised learning is from a training set of labeled examples.

    • Each example describes a situation and the action to take in this situation.

    • The objective is to extrapolate/generalize from this set of examples.

    • However, in interactive problem it is impractical to obtain examples that are both correct and representative.

  • Unsupervised learning

    • Finding structure hidden in unlabeled examples.

    • You could think that is the same, because it doesn’t rely on examples of correct behaviors.

Activity

What is the difference?

1.5. The challenges of reinforcement learning.

Reinforcement learning has a lot of challenges:

  • Trade-off between exploration and exploitation

    • To obtain a lot of reward the agent must choose the action that it tried and found effective.

    • However, to discover such actions it needs to try previously unselected actions.

    • In the first case, we say that the agent exploit.

    • In the second we say that the agent explore.

    • You can neither do one or the other only, you need both.

  • Considering the whole problem:

    • Reinforcement learning consider the whole problem.

    • Start with a complete interactive, goal seeking agent.

    • Assumed that the agent has to operate despite uncertainty about the environment.

1.6. Elements of reinforcement learning

Activity

  • What are the two elements we talked about that compose reinforcement learning?

  • Reinforcement learning is also composed of:

    • A policy

    • A reward function

    • A value function

    • A model of the environment (optional)

Definition

A policy is a function that maps each state to an action.

  • It define the behavior of an agent at a given time.

  • It is the core of reinforcement learning agent, because it is sufficient to determine its behavior.

  • It can either be:

    • Deterministic

    • Stochastic

../../_images/policy.drawio.png

Definition

A reward is a value returned by the environment at a time step \(t\).

  • It defines the goal of a reinforcement learning problem.

  • Remember that the agent’s objective is to maximize the total reward it receives.

Definition

A value function is a function returning for each state the total expected reward starting from this state.

Activity

  • What is the difference between the reward and the value function?

  • Between the following state which one would you select?

    • \(s_1\): reward 100, value function 3

    • \(s_2\): reward 1, value function 5

  • We seek actions that lead us to states that bring higher value not higher reward.

  • Unfortunately, it is harder to determine the value than the reward.

Definition

The model of the environment is the representation of the dynamic of the problem.

  • For a given state and action, the model is able to predict the next state and next reward.

  • We consider two types of reinforcement learning methods:

    • Model-based

    • Model-free

1.7. Limitations

  • Reinforcement learning relies heavily on the concept of state.

  • It is the input of the policy and the value function.

  • We will not discuss how we design or model the state.

  • However, it has a great impact on the efficiency of the method.