This repository contains lab and homework assignment for the course Reinforcement Learning offered in M.Sc Artificial Intelligence at the University of Amsterdam.
Lab 1 - Dynammic Programming, Policy Evaluation, Policy Iteration, Value Iteration, MC Control, TD Learning
In this lab we get familiar with basic concepts of Dynammic Programming and use it for the implementation of Policy Evaluation, Policy Iteration and Value Iteration for GridWorldEnv. We also implement Monte Carlo Prediction and Monte Carlo control with \epsilon-greedy policy on BlackjackEnv. Apart from that we also explore Temporal Difference Learning.

Approximate state-value functions for the blackjack policy that sticks only on 20
or 21, computed by Monte Carlo policy evaluation.

The optimal policy and state-value function for blackjack, found by Monte Carlo epsilon greedy policy
Problema and solution can be found under ipynb files here.
Problem and solution can be found under ipynb files here.
Homework was is in colloboration with Dhruba Pujary. Problems and Solutions to the homework assignment can be found here.