2. Multi-armed bandit - Action Selection methods
In this lab, ou will compare the other action selection methods.
First import the code done during the previous lab (Multi-armed bandit).
2.1. Upper-Confidence-Bound action-selection
Create a new function
ucb_selection(bandit, c, T)
:bandit
: is the class bandit created before.c
: is the UCB coefficient.T
: is the number of steps.
You can reuse function of
e_greedy(bandit, e, T)
and modify only the acton selection.Compare UCB and \(\epsilon\)-greedy for various parameter:
One possible is \(c=2\) and \(\epsilon=0.1\)
Plot the graph
2.2. Gradient Bandit Algorithms
Create a new function
gradient(bandit, alpha, T)
:bandit
: is the class bandit created before.alpha
: is the step-size parameter.T
: is the number of steps.
Compare the gradient bandit algorithm with UCB and \(\epsilon\)-greedy.
One possible is \(\alpha=2\) \(c=2\) and \(\epsilon=0.1\)
Plot the graph of the ratio between optimal action and suboptimal action.