8. Eligibility-traces - Algorithms
In this lab you will implement you first algorthim using eligibility-traces (Related topic).
You will need to reuse the code that you produced during the previous labs.
8.1. Sarsa(\(\lambda\))
Implement Sarsa(\(\lambda\)) algorithm.
Create a function
sarsa(env, eps, alpha, lambda, T, E)
:env
: the gym environment.eps
: is the \(\epsilon\) parameter.alpha
: is the step size parmeter.lambda
: is the decay parameter.T
: is the maximum number of steps.E
: is the number of episodes.
The function needs to return the policy, the history of the sum of rewards for each episodes and the weights vector.
8.2. Experiments
For different \(T\) and \(E\):
Run the algorithm.
Draw the evolution of the sum of rewards for both algorithms.
Draw the weights vector.
Now compare with the policy calculated using Semi-gradient Sarsa.