Frozen Lake#
Important
Due date: TBD
Context#
In this assignment, you will implement TD(0) and MC apply to the Frozen Lake problem, and analyze the results by comparing their convergence and policy performance.
Assignment#
Part 1: Implement Temporal Difference (TD) Learning#
Task: Implement TD(0) and MC algorithms.
Environment Setup:
Frozen Lake: Two versions - \(4\times 4\) and \(8\times 8\).
Use a discount factor ( \(\gamma = 0.9\) ).
Deliverables for Part 1:
Both algorithms TD(0) and MC.
A plot showing the value function convergence over episodes.
Analysis of how the different parameters affect the convergence:
learning rate (\(\alpha\)).
etc.
Part 2: Comparison and Analysis#
Task: Compare the performance and behavior of both methods (TD(0) and MC).
Required Analysis:
Discuss which method converges faster and why.
Analyze under what conditions one method outperforms the other.
Provide a table comparing key metrics, such as:
Number of episodes for convergence.
Sensitivity to hyperparameters (e.g., learning rate, episode length).
Stability of results.
Deliverables:
Report with comparisons and insights.
Plots showing side-by-side performance (e.g., convergence speed, value estimates).
Explanation of potential trade-offs between both methods.
Evaluation Criteria#
Correctness of Implementations: 40%
TD(0) and MC methods are implemented correctly and produce expected results.
Analysis and Insights: 30%
Depth of analysis in comparing the methods.
Clear presentation of convergence behaviors and performance metrics.
Code Quality: 20%
Well-commented and organized code.
Proper use of libraries and good programming practices.
Presentation: 10%
Clear plots, tables, and visualizations.
Well-written report with proper formatting and insightful observations.
Submission Guidelines#
Code files (.py) uploaded to Moodle.
Report in PDF format with relevant plots and analysis.
Resources#
Sutton & Barto, “Reinforcement Learning: An Introduction”
Documentation for environments like OpenAI Gym (optional but useful).