5. Monte-Carlo Methods
In this lab you will implement the Monte-Carlo algorithms seen in class (Monte-Carlo).
You will need to reuse the code that you produced during the previous labs (MDP lab, DP lab).
5.1. On-Policy
Implement the on-policy method.
Create a function
on_policy(env, eps, T)
:env
: the gym environment.eps
: is the \(\epsilon\) parameter.T
: is the maximum number of steps.
The function needs to return the policy and the history of expected returns (list of the expected return for each time step).
5.2. Off-Policy
Implement the off-policy method.
Create a function
off_policy(env, eps, T)
:env
: the gym environment.eps
: is the \(\epsilon\) parameter.T
: is the maximum number of steps.
The function needs to return the policy and the history of expected returns (list of the expected return for each time step).
5.3. Experiments
For different \(T\)
Run both algorithm.
Draw the evolution of the expected returns for both algorithms.
Conclude on both algorithms and compare with what we said in class.
Now compare with the policy calculated using dynamic programming.
What is the method that brings the best performance?
Why?
Compare the ratio between efficiency and running time.