5. Monte-Carlo Methods

In this lab you will implement the Monte-Carlo algorithms seen in class (Monte-Carlo).
You will need to reuse the code that you produced during the previous labs (MDP lab, DP lab).

5.1. On-Policy

Implement the on-policy method.
Create a function on_policy(env, eps, T):
- env: the gym environment.
- eps: is the \(\epsilon\) parameter.
- T: is the maximum number of steps.
The function needs to return the policy and the history of expected returns (list of the expected return for each time step).

5.2. Off-Policy

Implement the off-policy method.
Create a function off_policy(env, eps, T):
- env: the gym environment.
- eps: is the \(\epsilon\) parameter.
- T: is the maximum number of steps.
The function needs to return the policy and the history of expected returns (list of the expected return for each time step).

5.3. Experiments

For different \(T\)
- Run both algorithm.
- Draw the evolution of the expected returns for both algorithms.
Conclude on both algorithms and compare with what we said in class.
Now compare with the policy calculated using dynamic programming.
- What is the method that brings the best performance?
- Why?
Compare the ratio between efficiency and running time.