2. Multi-armed bandit - Action Selection methods

In this lab, ou will compare the other action selection methods.
First import the code done during the previous lab (Multi-armed bandit).

2.1. Upper-Confidence-Bound action-selection

Create a new function ucb_selection(bandit, c, T):
- bandit: is the class bandit created before.
- c: is the UCB coefficient.
- T: is the number of steps.
You can reuse function of e_greedy(bandit, e, T) and modify only the acton selection.
Compare UCB and \(\epsilon\)-greedy for various parameter:
- One possible is \(c=2\) and \(\epsilon=0.1\)
- Plot the graph

2.2. Gradient Bandit Algorithms

Create a new function gradient(bandit, alpha, T):
- bandit: is the class bandit created before.
- alpha: is the step-size parameter.
- T: is the number of steps.
Compare the gradient bandit algorithm with UCB and \(\epsilon\)-greedy.
- One possible is \(\alpha=2\) \(c=2\) and \(\epsilon=0.1\)
- Plot the graph of the ratio between optimal action and suboptimal action.