My implementation of Q-learning
and SARSA
algorithms for a simple grid-world environment.
The code involves visualization utility functions
for visualizing reward convergence, agent paths for SARSA and Q-learning together with heat maps of the agent's action/value function.
cliff_walking.py
: Q-learning, SARSA, Visualization Functionscliff_walking_report.pdf
: Analysis on the Q-learning and SARSA algorithms