Project

  • Worth: 20% of the final grade

Important

It is an individual project spread over the whole semester, therefore the amount of work expected is high.

Introduction

  • This project is an individual project.

  • The topic of the project is the Sokoban game.

../_images/Sokoban_ani.gif
  • You will need to define the Markov Decision Process of the Sokoban.

  • Then you will need to define and use a reinforcement learning algorithm to solve it.

  • At the end of the project you will submit your code and a report explaining your model, your algorithm and its performance.

Implementation details

  • The code must be done in python.

  • The Sokoban must be implemented as a Gym environment.

  • You can find more detail about the API here: Gym API.

  • Your RL algorithm must use the Gym Environment that you implemented and any other Gym environment.

  • You can save and load a policy.

  • You can load a Sokoban map/level and use it.

Final report

  • The final report must be organized as follows:

    • Introduction

    • Markov Decision Process definition

    • Reinforcement Learning algorithm

    • Experiments

    • Results

    • Conclusion

  • The report must be written in Latex.

  • You need to submit the Latex file and the pdf file.

Project’s steps

The project is divided in different steps with different due dates.

Step

Due Date

Worth

Project due

November 30th

80%

Final presentation

Last day

20%

Presentation instructions

  • It’s a 15 minutes presentation.

  • No overtime will be accepted.

  • Common rules about presentations:

    • You must stand in front of the class.

    • You should not read your slides.

    • The slides must not contain too much text.

Marking schemes

  • The detailed marking scheme for the project can be found here: Project

  • The detailed marking scheme for the final presentation can be found here: Project

Important

  • To receive more than 60% for the project, you need to meet the following conditions:

    • The code can be executed without any crash or bugs.

    • The MDP is implemented as a Gym environment.

    • The RL algorithm works and learns a policy.

    • It is possible to execute the learned policy.

    • A policy was submitted and can be executed.

    • A Sokoban map was submitted and can be loaded.

  • If one of these conditions is not met, the maximum grade reachable will be 60%.

Academic Integrity

  • Any cheating/plagiarism will be sanctioned by a zero and an automatic report.

  • No exception will be allowed.

  • You can find the academic integrity policy here: Academic integrity.

  • A list of non-exhaustive things that are considered cheating/plagiarism:

    • Submitting someone else code. Even with citations!

    • Asking someone else to do the code or write the report.

    • Submitting someone else report.

    • Etc.