5. Monte-Carlo Methods

  • In this lab you will implement the Monte-Carlo algorithms seen in class (Monte-Carlo).

  • You will need to reuse the code that you produced during the previous labs (MDP lab, DP lab).

5.1. On-Policy

  • Implement the on-policy method.

  • Create a function on_policy(env, eps, T):

    • env: the gym environment.

    • eps: is the \(\epsilon\) parameter.

    • T: is the maximum number of steps.

  • The function needs to return the policy and the history of expected returns (list of the expected return for each time step).

5.2. Off-Policy

  • Implement the off-policy method.

  • Create a function off_policy(env, eps, T):

    • env: the gym environment.

    • eps: is the \(\epsilon\) parameter.

    • T: is the maximum number of steps.

  • The function needs to return the policy and the history of expected returns (list of the expected return for each time step).

5.3. Experiments

  • For different \(T\)

    • Run both algorithm.

    • Draw the evolution of the expected returns for both algorithms.

  • Conclude on both algorithms and compare with what we said in class.

  • Now compare with the policy calculated using dynamic programming.

    • What is the method that brings the best performance?

    • Why?

  • Compare the ratio between efficiency and running time.