2. Multi-armed bandit - Action Selection methods

  • In this lab, ou will compare the other action selection methods.

  • First import the code done during the previous lab (Multi-armed bandit).

2.1. Upper-Confidence-Bound action-selection

  • Create a new function ucb_selection(bandit, c, T):

    • bandit: is the class bandit created before.

    • c: is the UCB coefficient.

    • T: is the number of steps.

  • You can reuse function of e_greedy(bandit, e, T) and modify only the acton selection.

  • Compare UCB and \(\epsilon\)-greedy for various parameter:

    • One possible is \(c=2\) and \(\epsilon=0.1\)

    • Plot the graph

2.2. Gradient Bandit Algorithms

  • Create a new function gradient(bandit, alpha, T):

    • bandit: is the class bandit created before.

    • alpha: is the step-size parameter.

    • T: is the number of steps.

  • Compare the gradient bandit algorithm with UCB and \(\epsilon\)-greedy.

    • One possible is \(\alpha=2\) \(c=2\) and \(\epsilon=0.1\)

    • Plot the graph of the ratio between optimal action and suboptimal action.