9. Actor-Critic
In this lab you will implement actor-critic algorithms (Related topic).
You will need to reuse the code that you produced during the previous labs.
9.1. One-step Actor-Critic
Implement the one-step actor-critic algorithm.
Create a function
ac(env, alpha, T, E)
:env
: the gym environment.alpha
: is the evctor containing the two step size parmeters.T
: is the maximum number of steps.E
: is the number of episodes.
The function needs to return the policy, the history of the sum of rewards for each episodes and the weights vector.
9.2. One-step Actor-Critic
Implement the one-step actor-critic algorithm.
Create a function
sarsa-AC(env, alpha, lambda, T, E)
:env
: the gym environment.alpha
: is the evctor containing the three step size parmeters.lambda
: the decay parameter.T
: is the maximum number of steps.E
: is the number of episodes.
The function needs to return the policy, the history of the sum of rewards for each episodes and the weights vector.
9.3. Experiments
For different \(T\) and \(E\):
Run both algorithm.
Draw the evolution of the sum of rewards for both algorithms.
Draw the weights vector.
Now compare with the policy calculated using Semi-gradient Sarsa.