File tree Expand file tree Collapse file tree 2 files changed +4
-4
lines changed
contents/12_Proximal_Policy_Optimization Expand file tree Collapse file tree 2 files changed +4
-4
lines changed Original file line number Diff line number Diff line change 1
1
"""
2
- A simple version of OpenAI's Proximal Policy Optimization (PPO). [http ://adsabs.harvard.edu /abs/2017arXiv170706347S ]
2
+ A simple version of OpenAI's Proximal Policy Optimization (PPO). [https ://arxiv.org /abs/1707.06347 ]
3
3
4
4
Distributing workers in parallel to collect data, then stop worker's roll-out and train PPO on collected data.
5
5
Restart workers once PPO is updated.
6
6
7
7
The global PPO updating rule is adopted from DeepMind's paper (DPPO):
8
- Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [http ://adsabs.harvard.edu /abs/2017arXiv170702286H ]
8
+ Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [https ://arxiv.org /abs/1707.02286 ]
9
9
10
10
View more on my tutorial website: https://morvanzhou.github.io/tutorials
11
11
Original file line number Diff line number Diff line change 2
2
A simple version of Proximal Policy Optimization (PPO) using single thread.
3
3
4
4
Based on:
5
- 1. Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [http ://adsabs.harvard.edu /abs/2017arXiv170702286H ]
6
- 2. Proximal Policy Optimization Algorithms (OpenAI): [http ://adsabs.harvard.edu /abs/2017arXiv170706347S ]
5
+ 1. Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [https ://arxiv.org /abs/1707.02286 ]
6
+ 2. Proximal Policy Optimization Algorithms (OpenAI): [https ://arxiv.org /abs/1707.06347 ]
7
7
8
8
View more on my tutorial website: https://morvanzhou.github.io/tutorials
9
9
You can’t perform that action at this time.
0 commit comments