- merged discrete and continuous algorithms
- added linear decaying for the continuous action space
action_std
; to make training more stable for complex environments - added different learning rates for actor and critic
- episodes, timesteps and rewards are now logged in
.csv
files - utils to plot graphs from log files
- utils to test and make gifs from preTrained networks
PPO_colab.ipynb
combining all the files to train / test / plot graphs / make gifs on google colab in a convenient jupyter-notebook
This repository provides a Minimal PyTorch implementation of Proximal Policy Optimization with clipped objective for OpenAI gym environments. It is primarily intended for beginners in RL for understanding the PPO algorithm. It can still be used for complex environments but may require some hyperparameter-tuning or changes in the code.
To keep the training procedure simple :
- It has a constant standard deviation for the output action distribution (multivariate normal with diagonal covariance matrix) for the continuous environments, i.e. it is a hyperparameter and NOT a trainable parameter. However, it is linearly decayed. (action_std significantly affects performance)
- It uses simple monte-carlo estimate for calculating returns and NOT Generalized Advantage Estimate (check out the OpenAI spinning up implementation for that).
- It is a single threaded implementation, i.e. only one worker collects experience. One of the older forks of this repository has been modified to have Parallel workers
A concise explaination of PPO algorithm can be found here
- To train a new network : run
train.py
- To test a preTrained network : run
test.py
- To plot graphs using log files : run
plot_graph.py
- To save images for gif and make gif using a preTrained network : run
make_gif.py
- All parameters and hyperparamters to control training / testing / graphs / gifs are in their respective
.py
file PPO_colab.ipynb
combines all the files in a jupyter-notebook- All the hyperparameters used for training are listed in the
README.md
in PPO_preTrained directory
Please use this bibtex if you want to cite this repository in your publications :
@misc{pytorch_minimal_ppo,
author = {Barhate, Nikhil},
title = {Minimal PyTorch Implementation of Proximal Policy Optimization},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/nikhilbarhate99/PPO-PyTorch}},
}
PPO Continuous RoboschoolWalker2d-v1 | PPO Continuous RoboschoolWalker2d-v1 |
---|---|
PPO Continuous BipedalWalker-v2 | PPO Continuous BipedalWalker-v2 |
---|---|
PPO Discrete CartPole-v1 | PPO Discrete CartPole-v1 |
---|---|
PPO Discrete LunarLander-v2 | PPO Discrete LunarLander-v2 |
---|---|
Trained and Tested on:
Python 3
PyTorch
NumPy
gym
Pillow
Training Environments
Roboschool
pybullet
Graphs and gifs
pandas
matplotlib
Pillow