This repository provides a clean and modular implementation of Proximal Policy Optimization (PPO) using PyTorch, designed to help beginners understand and experiment with reinforcement learning algorithms. It includes both continuous and discrete action spaces, demonstrated on environments from OpenAI Gym. The structure is flexible, allowing easy modifications to work with other custom environments.
Key Features:
- Modular and easy-to-understand code
- Supports both continuous and discrete action spaces
- YAML-based configuration for managing hyperparameters
- Out-of-the-box compatibility with OpenAI Gym environments
Clone the repository
git clone https://github.com/saqib1707/RL-PPO-PyTorch.git
cd PPO-PyTorch
To run this code, you need the following dependencies:
- torch
- numpy
- gym
- pygame
- box2d
- box2d-py
Create a virtual environment and install the dependencies:
- Create a virtual environment
python -m venv /path/to/venv/directory
- Activate the virtual environment
source /path/to/venv/directory/bin/activate
- Install required dependencies
pip install -r requirements.txt
Note: For environments like LunarLander-v2
and BipedalWalker
, make sure you have swig
and box2d
installed.
- Install
swig
in Mac or Linux:
For MacOS:
brew install swig
For Linux:
apt-get install swig
- To install box2d, run
pip install box2d
pip install box2d-py
Note: Gymnasium (the successor to OpenAI Gym) supports Python versions up to 3.11. There have been issues reported with installing gym[box2d] on Python 3.8, 3.9, and 3.10.
Supported environments include:
- CartPole
- LunarLander
- Walker2d
- HalfCheetah
- BipedalWalker
Configuration files for each environment are located in the configs/
directory. These can be customized to adjust hyperparameters for each run.
Run the training script with the desired environment configuration:
python launcher.py --config_path="../configs/config_cartpole.yaml"
For other environments, simply modify the config path, for example:
python launcher.py --config_path="../configs/config_lunarlander.yaml"
To run experiments with modified hyperparameters, you can override the default settings from the YAML file using the --override
flag:
python launcher.py --config_path="../configs/config_cartpole.yaml" --override "mode=test" "hidden_dim=256" "gamma=0.95"
To be updated soon...
Contributions are welcome! If you have suggestions for improving the code or adding new features, feel free to submit a pull request or open an issue.
If you use this repository in your research, please consider citing it:
@misc{ppo_pytorch,
author = {Azim, Saqib},
title = {Proximal Policy Optimization using PyTorch},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/saqib1707/RL-PPO-PyTorch}},
}
- PPO paper
- PPO-for-Beginners
- PPO-PyTorch
- PPO Stack Overflow Explanation
- An Intuitive Explanation of Policy Gradient
- ICLR PPO Implementation details
- PPO-implementation-details
- OpenAI Spinning Up
Feel free to reach out with any questions or suggestions:
Email: azimsaqib10@gmail.com