1. Multi-armed Bandit

  • In this lab, you will test the impact of \(\epsilon\)-greedy action selection compared to a greedy action selection.

1.1. Setting up

1.1.1. Virtual environment

  • First you will set up a virtual environment for python.

  • Create a folder for the labs.

  • Inside your new folder, open a terminal and enter the following command:

python3 -m virtualenv CSCI_531
  • It should create a new folder CSCI_531.

  • Activate the virtual environment, by entering the following command:

source ./CSCI_531/bin/activate

Important

Every lab you will need to activate the virtual environment before using python!

1.1.2. Python libraries

  • Now you need a few libraries.

  • The main library that we will use is Gymnasium .

  • To install it use:

pip install gymnasium

1.2. Gym

1.2.1. What is a Gym environment

  • Gymanisum provides tools to create Reinforcement Learning environments.

  • The environments are implemented using the same structures:

    • It allows to implement algorithms that works for all Gym environments.

    • Helps organize and debug your environment.

1.2.2. Custom environment

  • One thing we want to do is creating our own custom environment.

1.2.2.1. Environment

  • Each class inherits from gymnasium.Env.

  • Then we need to specify self.action_space:

    • It is the mathematical representation of the actions the agent can execute.

    • For example, if the agent can move left or right:

      • We have two actions.

      • The actions are discrete.

      • We will define the action space as Discrete(2) # {0, 1}.

      • Action 0 will be left and 1 will be right.

Example

  • An example of a custom environment with an action space of two discrete actions.

import gymnasium as gym
from gymnasium.spaces import Discrete

class MyEnv(gym.Env):

   def __init__(self):
      self.action_space = Discrete(2)

1.2.2.2. Step

  • Once the environment is created, we need to define how the agent will interact with it.

  • It is done with the method step(self, action).

    • action: Action selected by the user or (AI).

    • Use this function to modify the environment if necessary, like the position of a robot, etc.

  • It returns 5 parameters:

    • observation

    • reward

    • terminated

    • truncated

    • info

  • For now, we only care about the reward.

Example

  • An example of a custom environment with the step function defined.

import gymnasium as gym
from gymnasium.spaces import Discrete

class MyEnv(gym.Env):

   def __init__(self):
      self.action_space = Discrete(2)

   def step(self, action):
      reward = # How your reward is calculated.
      return [], reward, False, False, {} # Only reward is important for now, put everything else as default value.

1.3. 10-armed bandit

  • You need to implement the 10-armed bandit.

  • Create a class tenBandit as a Gym environment.

  • Define the action space.

  • Initialize each arm.

    • Each arm \(a\) has an optimal value \(q^*(a)\) that you will sample in a normal distribution of mean \(0\) and variance \(1\).

  • Create a function step() that simulate an arm being selected.

    • When a user selects an arm it returns a value sampled from a normal distribution of mean \(q^*(a)\) and variance \(1\).

1.4. Simple algorithm

  • Now you need to implement the algorithm seen in class (Multi-armed bandit).

  • Create a function e_greedy(bandit, e, T)

    • bandit: is the class bandit created before.

    • e: is the \(\epsilon\).

    • T: is the number of steps.

  • The function should return:

    • The evolution of the expected rewards.

    • The evolution of the percentage of optimal action.

1.5. Experiments

  • Combine everything to run different experiments.

    • Compare a greedy (\(\epsilon = 0\)) action selection with an \(\epsilon\)-greedy action selection(\(\epsilon = 0.1\))

    • Compare different \(\epsilon\)-greedy, to show the impact of \(\epsilon\).

  • Plot the results

Example

  • An example with \(\epsilon = 0.1\).

../../_images/results.png