1. Spaceships rental

  • Due date: October 13th, 2022 – 11:55pm

  • In this assignment, you will model a problem as a Markov decision Problem.

1.1. Context

  • You are the manager of two locations for the largest spaceship rental company of the United Federation of Planets.

  • Every day, some customers arrive at each location to rent a spaceship.

    • If you have a spaceship available, you rent it out and you receive 10 federation credits from the main company.

    • If no spaceships are available, you get fired.

  • Spaceships are available for renting the day after they are returned. (Your team needs time to remove the space dust.)

  • To make sure, you have enough spaceships at both locations, you can move spaceships between the two spaceports overnight. (They are equipped with warp speed engine.)

    • It cost you 2 federation credits.

  • Your supercomputer tells you that the number of spaceships requested and returned at each spaceport follow a Poisson distribution.

    • Meaning that the probability that \(n\) spaceships are rented (or returned) is calculated by \(\frac{\lambda^n}{n!}e^{-\lambda}\), where \(\lambda\) is the expected number.

  • Your supercomputer gives you the following information:

    • First spaceport: \(\lambda = 3\) for rental requests and \(\lambda=3\) for returns.

    • Second spaceport: \(\lambda = 4\) for rental requests and \(\lambda=2\) for returns.

  • The capacity of the spaceport being limited (spaceships are huge), you can only store 20 at the same spaceport. After reaching this amount, you send them back to the company space station and don’t consider them anymore.

  • Also, you only have 5 crews at each spaceport, so you can only move 5 spaceships from one location to the other. (Autopilot is not a thing yet for spaceships.)

../../_images/spaceship.drawio.png

1.2. Assignment

  1. Model this problem as a MDP;

    • Define the state space,

    • Define the action space,

    • Define the transition function,

    • Define the reward function.

  2. Implement this problem as a Gym environment as seen in labs.

    • Inherit from the Gym environment class.

    • Implement all the necessary function of the class (step, reset, etc.)

  3. Implement Value Iteration.

  4. Calculate the optimal policy and plot the evolution of the expected value function.

1.3. Submission

You need to submit on Moodle the following:

  • A latex document with the problem model as an MDP.

  • Your code in a python file. The filename should be your last name followed by asn-mdp.

  • Your plots.

1.4. Academic Integrity

  • Any cheating/plagiarism will be sanctioned by a zero and an automatic report.

  • No exception will be allowed.

  • You can find the academic integrity policy here: Academic integrity.

  • A list of non-exhaustive things that are considered cheating/plagiarism:

    • Submitting someone else code. Even with citations!

    • Asking someone else to do the code or write the report.

    • Submitting someone else report.

    • Etc.