Pytorch-DPPO

Overview

This is the code for this video on Youtube by Siraj Raval on OpenAI Five vs DOTA 2. The author of this code is alexis-jacq. The real code is not yet publically available, but this is a basic version of the algorithm.

Dependencies

PyTorch
OpenAI Gym

Usage

Run 'python main.py (gym_environment_name)' in terminal

Pytorch-DPPO

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https://arxiv.org/pdf/1707.06347.pdf).

I finally fixed what was wrong with the gradient descent step, using previous log-prob from rollout batches. At least ppo.py is fixed, the rest is going to be corrected as well very soon.

In the following example I was not patient enough to wait for million iterations, I just wanted to check if the model is properly learning:

Progress of single PPO:

InvertedPendulum

InvertedDoublePendulum

HalfCheetah

hopper (PyBullet)

halfcheetah (PyBullet)

Progress of DPPO (4 agents) [TODO]

Acknowledgments

The structure of this code is based on https://github.com/ikostrikov/pytorch-a3c.

Hyperparameters and loss computation has been taken from https://github.com/openai/baselines

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
figs		figs
LICENSE		LICENSE
README.md		README.md
chief.py		chief.py
main.py		main.py
model.py		model.py
ppo.py		ppo.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Dependencies

Usage

Pytorch-DPPO

Progress of single PPO:

Acknowledgments

About

Releases

Packages

Languages

License

guerraepaz/OpenAI_Five_vs_Dota2_Explained

Folders and files

Latest commit

History

Repository files navigation

Overview

Dependencies

Usage

Pytorch-DPPO

Progress of single PPO:

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages