Reinforcement Learning Algorithms

These implementations are based on Shiyu Zhao's Foundations of Reinforcement Learning book.

This repository contains algorithm applications for the GridWorld problem. And is a comprehensive implementation of reinforcement learning algorithms for solving the Grid World environment.

Algorithms Implemented

Dynamic Programming

Value Iteration - Iterative policy evaluation and improvement
Policy Iteration - Policy evaluation followed by policy improvement

Monte Carlo Methods

First-visit and every-visit Monte Carlo control
Exploring starts and epsilon-soft policies

Temporal Difference Learning

SARSA - On-policy TD control
Q-Learning - Off-policy TD control
Deep Q-Networks (DQN) - Neural network based Q-learning

Project Structure

├── configs/              # YAML configuration files
│   ├── run_dqn.yaml
│   ├── run_monte_carlo.yaml
│   ├── run_policy_iteration.yaml
│   ├── run_qlearning.yaml
│   ├── run_sarsa.yaml
│   └── run_value_iteration.yaml
├── src/                  # Core source code
│   ├── environment.py   # Grid World implementation
│   └── visualizer.py    # Visualization utilities
├── solvers/             # RL algorithm implementations
│   ├── value_iteration.py
│   ├── policy_iteration.py
│   ├── monte_carlo.py
│   ├── temporal_difference.py
│   ├── q_learning.py
│   └── deep_q_learning.py
├── reference/           # Reference implementations
├── utils/              # Helper functions
└── main.py            # Main entry point

Run an algorithm:

# DQN
python main.py --config configs/run_dqn.yaml

Configuration

Configure algorithms via YAML files or command line:

# Example configs/run_qlearning.yaml
name: "q_learning_small_gamma"

algorithm: "q_learning"

# Environment params
env:
  log_history: 1
  size: 5
  initial_state: [0, 0]
  forbidden_states:
    - [1, 1]
    - [1, 2]
    - [2, 2]
    - [3, 1]
    - [4, 1]
    - [3, 3]
  target_state: [3, 2]

  reward_target: 0.0
  reward_forbidden: -10.0 
  reward_boundary: -10.0
  reward_other: -1.0

qlearning_config:
  n_episodes: 500
  episode_len: 200
  epsilon_decay: "exponential"
  epsilon: 0.9
  min_epsilon: 0.05
  alpha: 0.1

Features

Modular Design: Each algorithm in separate, reusable modules
Visualization: Real-time grid visualization with Pygame
Extensible: Easy to add new algorithms or environments
Configurable: YAML-based configuration for experiments
Benchmarking: Compare different algorithms on same problems

Description

Each algorithm generates:

Convergence plots (value/policy convergence)
Episode reward trends
Final optimal policy visualization
Performance metrics (steps per episode, total reward)

Example: main.py --config configs/run_sarsa.yaml, with no tabular approximation, $\gamma=0.99$ and 10,000 episodes and saving the log each 100 episodes:

Snapshot at iteration 130

At the same time the performance is plotted:
Progress for 4300 episodes

At the end the final policy is shown, which for SARSA only shows an "optimal path":
Optimal policy learned

Requirements

Python 3.8+
NumPy
Matplotlib
PyTorch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reinforcement Learning Algorithms

Algorithms Implemented

Dynamic Programming

Monte Carlo Methods

Temporal Difference Learning

Project Structure

Configuration

Features

Description

Requirements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
configs		configs
example		example
reference		reference
solvers		solvers
src		src
utils		utils
README.md		README.md
main.py		main.py

BalorLC3/Reinforcement-Learning-Algorithms-

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Algorithms

Algorithms Implemented

Dynamic Programming

Monte Carlo Methods

Temporal Difference Learning

Project Structure

Configuration

Features

Description

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages