8. Eligibility-traces - Algorithms

  • In this lab you will implement you first algorthim using eligibility-traces (Related topic).

  • You will need to reuse the code that you produced during the previous labs.

8.1. Sarsa(\(\lambda\))

  • Implement Sarsa(\(\lambda\)) algorithm.

  • Create a function sarsa(env, eps, alpha, lambda, T, E):

    • env: the gym environment.

    • eps: is the \(\epsilon\) parameter.

    • alpha: is the step size parmeter.

    • lambda: is the decay parameter.

    • T: is the maximum number of steps.

    • E: is the number of episodes.

  • The function needs to return the policy, the history of the sum of rewards for each episodes and the weights vector.

8.2. Experiments

  • For different \(T\) and \(E\):

    • Run the algorithm.

    • Draw the evolution of the sum of rewards for both algorithms.

  • Draw the weights vector.

  • Now compare with the policy calculated using Semi-gradient Sarsa.