Skip to content

Commit b55125a

Browse files
MorvanZhouMorvan Zhou
authored andcommitted
update ppo
1 parent 11a884b commit b55125a

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

contents/12_Proximal_Policy_Optimization/DPPO.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
"""
2-
A simple version of OpenAI's Proximal Policy Optimization (PPO). [http://adsabs.harvard.edu/abs/2017arXiv170706347S]
2+
A simple version of OpenAI's Proximal Policy Optimization (PPO). [https://arxiv.org/abs/1707.06347]
33
44
Distributing workers in parallel to collect data, then stop worker's roll-out and train PPO on collected data.
55
Restart workers once PPO is updated.
66
77
The global PPO updating rule is adopted from DeepMind's paper (DPPO):
8-
Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [http://adsabs.harvard.edu/abs/2017arXiv170702286H]
8+
Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [https://arxiv.org/abs/1707.02286]
99
1010
View more on my tutorial website: https://morvanzhou.github.io/tutorials
1111

contents/12_Proximal_Policy_Optimization/simply_PPO.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
A simple version of Proximal Policy Optimization (PPO) using single thread.
33
44
Based on:
5-
1. Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [http://adsabs.harvard.edu/abs/2017arXiv170702286H]
6-
2. Proximal Policy Optimization Algorithms (OpenAI): [http://adsabs.harvard.edu/abs/2017arXiv170706347S]
5+
1. Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [https://arxiv.org/abs/1707.02286]
6+
2. Proximal Policy Optimization Algorithms (OpenAI): [https://arxiv.org/abs/1707.06347]
77
88
View more on my tutorial website: https://morvanzhou.github.io/tutorials
99

0 commit comments

Comments
 (0)