Cliff Walking#

Important

Due date: TBD

In this assignment, you will solve a Markov decision Problem using value iteration.

Context#

In this assignment you will solve a modified version of the Cliff Walking problem using the algorithm Value Iteration.

Env#

The environment was modified to be more consistent with the implementation of Value Iteration see in class. The code is provided below and should not be modified.

However, there is one modification that will impact the algorithm, self.P[s][a] returns 4 values (probability, state, reward, done) instead of 3 (probability, state, reward).

The environment takes one parameter is_slippery: bool. If is_slippery = False, the problem is deterministic, but if is_slippery = True the agent walking close to the cliff could slip and fall.

Additional informations#

You are provided a function that run the policy on the environment and returns the cumulative rewards at the end of the run.

def run(env, pi):
    """
    Run the policy on the environment and returns the cumulative reward.
    :param: env: The environment
    :param: pi: The policy calculated by value iteration
    :return: Cumulative reward
    """
    s = env.reset()
    done = False
    sum_r = 0
    while not done:
        a = pi[s]
        s, r, done = env.step(a)
        sum_r += r
    return sum_r

Assignment#

  1. Implement value iteration.

    • Be careful of the change of self.P.

  2. Create a function evaluate(env, pi, N).

    • The function should execute run(env, pi) N times and returns the average cumulative reward.

  3. Calculate two policies:

    • One policy with is_slippery = False, called pi_false.

    • One policy with is_slippery = True, called pi_true.

  4. Policy evaluation

    • Evaluate pi_false and pi_true.

    • A small paragraph about the difference between both policies and why the values are different.

Submission#

You need to submit on Moodle your code in a python file. The filename should be your last name followed by asn-cliffwalking. The code file should contain:

  • At the top of the file your student ID as # Student ID: XXXXXX>.

  • At the bottom of the file the paragraph (in comments) with your explanation on both policies.

Important

Any modification of the codes provided will give a grade of 0.

Academic Integrity#

  • Any cheating/plagiarism will be sanctioned by a zero and an automatic report.

  • No exception will be allowed.

  • You can find the academic integrity policy here: Academic integrity.