9. Actor-Critic

In this lab you will implement actor-critic algorithms (Related topic).
You will need to reuse the code that you produced during the previous labs.

9.1. One-step Actor-Critic

Implement the one-step actor-critic algorithm.
Create a function ac(env, alpha, T, E):
- env: the gym environment.
- alpha: is the evctor containing the two step size parmeters.
- T: is the maximum number of steps.
- E: is the number of episodes.
The function needs to return the policy, the history of the sum of rewards for each episodes and the weights vector.

9.2. One-step Actor-Critic

Implement the one-step actor-critic algorithm.
Create a function sarsa-AC(env, alpha, lambda, T, E):
- env: the gym environment.
- alpha: is the evctor containing the three step size parmeters.
- lambda: the decay parameter.
- T: is the maximum number of steps.
- E: is the number of episodes.
The function needs to return the policy, the history of the sum of rewards for each episodes and the weights vector.

9.3. Experiments

For different \(T\) and \(E\):
- Run both algorithm.
- Draw the evolution of the sum of rewards for both algorithms.
Draw the weights vector.
Now compare with the policy calculated using Semi-gradient Sarsa.