Skip to content

BalorLC3/Reinforcement-Learning-Algorithms-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning Algorithms

These implementations are based on Shiyu Zhao's Foundations of Reinforcement Learning book.

This repository contains algorithm applications for the GridWorld problem. And is a comprehensive implementation of reinforcement learning algorithms for solving the Grid World environment.

Algorithms Implemented

Dynamic Programming

  • Value Iteration - Iterative policy evaluation and improvement
  • Policy Iteration - Policy evaluation followed by policy improvement

Monte Carlo Methods

  • First-visit and every-visit Monte Carlo control
  • Exploring starts and epsilon-soft policies

Temporal Difference Learning

  • SARSA - On-policy TD control
  • Q-Learning - Off-policy TD control
  • Deep Q-Networks (DQN) - Neural network based Q-learning

Project Structure

├── configs/              # YAML configuration files
│   ├── run_dqn.yaml
│   ├── run_monte_carlo.yaml
│   ├── run_policy_iteration.yaml
│   ├── run_qlearning.yaml
│   ├── run_sarsa.yaml
│   └── run_value_iteration.yaml
├── src/                  # Core source code
│   ├── environment.py   # Grid World implementation
│   └── visualizer.py    # Visualization utilities
├── solvers/             # RL algorithm implementations
│   ├── value_iteration.py
│   ├── policy_iteration.py
│   ├── monte_carlo.py
│   ├── temporal_difference.py
│   ├── q_learning.py
│   └── deep_q_learning.py
├── reference/           # Reference implementations
├── utils/              # Helper functions
└── main.py            # Main entry point

Run an algorithm:

# DQN
python main.py --config configs/run_dqn.yaml

Configuration

Configure algorithms via YAML files or command line:

# Example configs/run_qlearning.yaml
name: "q_learning_small_gamma"

algorithm: "q_learning"

# Environment params
env:
  log_history: 1
  size: 5
  initial_state: [0, 0]
  forbidden_states:
    - [1, 1]
    - [1, 2]
    - [2, 2]
    - [3, 1]
    - [4, 1]
    - [3, 3]
  target_state: [3, 2]

  reward_target: 0.0
  reward_forbidden: -10.0 
  reward_boundary: -10.0
  reward_other: -1.0

qlearning_config:
  n_episodes: 500
  episode_len: 200
  epsilon_decay: "exponential"
  epsilon: 0.9
  min_epsilon: 0.05
  alpha: 0.1

Features

  • Modular Design: Each algorithm in separate, reusable modules
  • Visualization: Real-time grid visualization with Pygame
  • Extensible: Easy to add new algorithms or environments
  • Configurable: YAML-based configuration for experiments
  • Benchmarking: Compare different algorithms on same problems

Description

Each algorithm generates:

  • Convergence plots (value/policy convergence)
  • Episode reward trends
  • Final optimal policy visualization
  • Performance metrics (steps per episode, total reward)

Example: main.py --config configs/run_sarsa.yaml, with no tabular approximation, $\gamma=0.99$ and 10,000 episodes and saving the log each 100 episodes:

Sarsa algorithm
Snapshot at iteration 130

At the same time the performance is plotted: Performance plot
Progress for 4300 episodes

At the end the final policy is shown, which for SARSA only shows an "optimal path": Final policy
Optimal policy learned

Requirements

  • Python 3.8+
  • NumPy
  • Matplotlib
  • PyTorch

About

RL algo's based on Shiyu's Foundations for Reinforcement Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages