update ppo

MorvanZhou · Morvan Zhou · commit b55125a775e0 · 2017-08-14T14:43:15.000+10:00
diff --git a/contents/12_Proximal_Policy_Optimization/DPPO.py b/contents/12_Proximal_Policy_Optimization/DPPO.py
@@ -1,11 +1,11 @@
 """
-A simple version of OpenAI's Proximal Policy Optimization (PPO). [http://adsabs.harvard.edu/abs/2017arXiv170706347S]
+A simple version of OpenAI's Proximal Policy Optimization (PPO). [https://arxiv.org/abs/1707.06347]
 
 Distributing workers in parallel to collect data, then stop worker's roll-out and train PPO on collected data.
 Restart workers once PPO is updated.
 
 The global PPO updating rule is adopted from DeepMind's paper (DPPO):
-Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [http://adsabs.harvard.edu/abs/2017arXiv170702286H]
+Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [https://arxiv.org/abs/1707.02286]
 
 View more on my tutorial website: https://morvanzhou.github.io/tutorials
 
diff --git a/contents/12_Proximal_Policy_Optimization/simply_PPO.py b/contents/12_Proximal_Policy_Optimization/simply_PPO.py
@@ -2,8 +2,8 @@
 A simple version of Proximal Policy Optimization (PPO) using single thread.
 
 Based on:
-1. Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [http://adsabs.harvard.edu/abs/2017arXiv170702286H]
-2. Proximal Policy Optimization Algorithms (OpenAI): [http://adsabs.harvard.edu/abs/2017arXiv170706347S]
+1. Emergence of Locomotion Behaviours in Rich Environments (Google Deepmind): [https://arxiv.org/abs/1707.02286]
+2. Proximal Policy Optimization Algorithms (OpenAI): [https://arxiv.org/abs/1707.06347]
 
 View more on my tutorial website: https://morvanzhou.github.io/tutorials