9. Actor-Critic

  • In this lab you will implement actor-critic algorithms (Related topic).

  • You will need to reuse the code that you produced during the previous labs.

9.1. One-step Actor-Critic

  • Implement the one-step actor-critic algorithm.

  • Create a function ac(env, alpha, T, E):

    • env: the gym environment.

    • alpha: is the evctor containing the two step size parmeters.

    • T: is the maximum number of steps.

    • E: is the number of episodes.

  • The function needs to return the policy, the history of the sum of rewards for each episodes and the weights vector.

9.2. One-step Actor-Critic

  • Implement the one-step actor-critic algorithm.

  • Create a function sarsa-AC(env, alpha, lambda, T, E):

    • env: the gym environment.

    • alpha: is the evctor containing the three step size parmeters.

    • lambda: the decay parameter.

    • T: is the maximum number of steps.

    • E: is the number of episodes.

  • The function needs to return the policy, the history of the sum of rewards for each episodes and the weights vector.

9.3. Experiments

  • For different \(T\) and \(E\):

    • Run both algorithm.

    • Draw the evolution of the sum of rewards for both algorithms.

  • Draw the weights vector.

  • Now compare with the policy calculated using Semi-gradient Sarsa.