[rllib] Fix docs to reference new code locations (ray-project#1092)

* fix rllib docs * Update example-a3c.rst
LoganDing · Oct 10, 2017 · 90013ed · 90013ed
1 parent a52a1e8
commit 90013ed
Show file tree

Hide file tree

Showing 3 changed files with 8 additions and 15 deletions.
diff --git a/doc/source/example-a3c.rst b/doc/source/example-a3c.rst
@@ -25,7 +25,7 @@ You can run the code with
 
 .. code-block:: bash
 
-  python/ray/rllib/a3c/example.py --num-workers=N
+  python/ray/rllib/train.py --env=Pong-ram-v4 --alg=A3C --config='{"num_workers": N}'
 
 Reinforcement Learning
 ----------------------
@@ -115,7 +115,7 @@ global model parameters. The main training script looks like the following.
   import numpy as np
   import ray
 
-  def train(num_workers, env_name="PongDeterministic-v3"):
+  def train(num_workers, env_name="PongDeterministic-v4"):
       # Setup a copy of the environment
       # Instantiate a copy of the policy - mainly used as a placeholder
       env = create_env(env_name, None, None)
@@ -147,7 +147,7 @@ global model parameters. The main training script looks like the following.
 Benchmarks and Visualization
 ----------------------------
 
-For the :code:`PongDeterministic-v3` and an Amazon EC2 m4.16xlarge instance, we
+For the :code:`PongDeterministic-v4` and an Amazon EC2 m4.16xlarge instance, we
 are able to train the agent with 16 workers in around 15 minutes. With 8
 workers, we can train the agent in around 25 minutes.
 

diff --git a/doc/source/example-evolution-strategies.rst b/doc/source/example-evolution-strategies.rst
@@ -11,14 +11,14 @@ To run the application, first install some dependencies.
 
 You can view the `code for this example`_.
 
-.. _`code for this example`: https://github.com/ray-project/ray/tree/master/python/ray/rllib/evolution_strategies
+.. _`code for this example`: https://github.com/ray-project/ray/tree/master/python/ray/rllib/es
 
 The script can be run as follows. Note that the configuration is tuned to work
 on the ``Humanoid-v1`` gym environment.
 
 .. code-block:: bash
 
-  python/ray/rllib/evolution_strategies/example.py
+  python/ray/rllib/train.py --env=Humanoid-v1 --alg=ES
 
 At the heart of this example, we define a ``Worker`` class. These workers have
 a method ``do_rollouts``, which will be used to perform simulate randomly

diff --git a/doc/source/example-policy-gradient.rst b/doc/source/example-policy-gradient.rst
@@ -12,20 +12,13 @@ least version ``1.0.0``) and a few other dependencies.
   pip install gym[atari]
   pip install tensorflow
 
-Then install the package as follows.
-
-.. code-block:: bash
-
-  cd ray/examples/policy_gradient/
-  python setup.py install
-
 Then you can run the example as follows.
 
 .. code-block:: bash
 
-  python/ray/rllib/policy_gradient/example.py --environment=Pong-ram-v3
+  python/ray/rllib/train.py --env=Pong-ram-v4 --alg=PPO
 
-This will train an agent on the ``Pong-ram-v3`` Atari environment. You can also
+This will train an agent on the ``Pong-ram-v4`` Atari environment. You can also
 try passing in the ``Pong-v0`` environment or the ``CartPole-v0`` environment.
 If you wish to use a different environment, you will need to change a few lines
 in ``example.py``.
@@ -41,4 +34,4 @@ Many of the TensorBoard metrics are also printed to the console, but you might
 find it easier to visualize and compare between runs using the TensorBoard UI.
 
 .. _`TensorFlow with GPU support`: https://www.tensorflow.org/install/
-.. _`code for this example`: https://github.com/ray-project/ray/tree/master/python/ray/rllib/policy_gradient
+.. _`code for this example`: https://github.com/ray-project/ray/tree/master/python/ray/rllib/ppo