CSCI 531
1.0
Notes
1. Introduction
2. Multi-armed bandit
3. Multi-armed bandit - Action Selection methods
4. Markov Decision Process
5. Policies and Value Function
6. Dynamic Programming
7. Monte Carlo Methods
8. Temporal-Difference Learning
9. Approximation - On-Policy
10. Eligibility Traces
11. Policy Gradient Methods
Labs
1. Multi-armed Bandit
2. Multi-armed bandit - Action Selection methods
3. Markov Decision Process
4. Dynamic Programming
5. Monte-Carlo Methods
6. TD Learning
7. Semi-gradient Sarsa
8. Eligibility-traces - Algorithms
9. Actor-Critic
Assignments
1. Spaceships rental
2. Maze Escape
Project
Project
Outline
Computer Science 531 - Reinforcement Learning
CSCI 531
»
Index
Index