Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] Split docs into user and development guide #1377

Merged
merged 12 commits into from
Jan 1, 2018
Prev Previous commit
Next Next commit
Sun Dec 31 23:43:05 PST 2017
  • Loading branch information
ericl committed Jan 1, 2018
commit fda5a54ff98e8f4b517d9566a0fa8fa420c1140d
2 changes: 1 addition & 1 deletion doc/source/rllib-dev.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ RLlib Developer Guide

.. note::

This guide will take you through steps for implementing a new algorithm in RLlib. To apply existing algorithms already implemented in RLlib, please see the `user docs <http://ray.readthedocs.io/en/latest/rllib.html>`__.
This guide will take you through steps for implementing a new algorithm in RLlib. To apply existing algorithms already implemented in RLlib, please see the `user docs <rllib.html>`__.

Recipe for an RLlib algorithm
-----------------------------
Expand Down
22 changes: 11 additions & 11 deletions doc/source/rllib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Ray RLlib is a reinforcement learning library that aims to provide both performa
- Pluggable distributed RL execution strategies

- Composability
- Integration with the `Ray.tune <http://ray.readthedocs.io/en/latest/tune.html>`__ hyperparam tuning tool
- Integration with the `Ray.tune <tune.html>`__ hyperparam tuning tool
- Support for multiple frameworks (TensorFlow, PyTorch)
- Scalable primitives for developing new algorithms
- Shared models between algorithms
Expand All @@ -17,18 +17,18 @@ You can find the code for RLlib `here on GitHub <https://github.com/ray-project/

RLlib currently provides the following algorithms:

- `Proximal Policy Optimization <https://arxiv.org/abs/1707.06347>`__ which
- `Proximal Policy Optimization (PPO) <https://arxiv.org/abs/1707.06347>`__ which
is a proximal variant of `TRPO <https://arxiv.org/abs/1502.05477>`__.

- Evolution Strategies which is decribed in `this
- `The Asynchronous Advantage Actor-Critic (A3C) <https://arxiv.org/abs/1602.01783>`__.

- `Deep Q Networks (DQN) <https://arxiv.org/abs/1312.5602>`__.

- Evolution Strategies, as described in `this
paper <https://arxiv.org/abs/1703.03864>`__. Our implementation
is adapted from
`here <https://github.com/openai/evolution-strategies-starter>`__.

- `The Asynchronous Advantage Actor-Critic <https://arxiv.org/abs/1602.01783>`__.

- `Deep Q Network (DQN) <https://arxiv.org/abs/1312.5602>`__.

These algorithms can be run on any `OpenAI Gym MDP <https://github.com/openai/gym>`__,
including custom ones written and registered by the user.

Expand Down Expand Up @@ -171,7 +171,7 @@ Custom Models and Preprocessors
RLlib includes default neural network models and preprocessors for common gym
environments, but you can also specify your own as follows. The interfaces for
custom model and preprocessor classes are documented in the
`RLlib Developer Guide <http://ray.readthedocs.io/en/latest/rllib-dev.html>`__.
`RLlib Developer Guide <rllib-dev.html>`__.

::

Expand All @@ -192,7 +192,7 @@ Using RLlib with Ray.tune
-------------------------

All Agents implemented in RLlib support the
`tune Trainable <http://ray.readthedocs.io/en/latest/tune.html#ray.tune.trainable.Trainable>`__ interface.
`tune Trainable <tune.html#ray.tune.trainable.Trainable>`__ interface.

Here is an example of using the command-line interface with RLlib:

Expand Down Expand Up @@ -231,9 +231,9 @@ in the ``config`` section of the experiments.

run_experiments(experiment)

.. _`managing a cluster with parallel ssh`: http://ray.readthedocs.io/en/latest/using-ray-on-a-large-cluster.html
.. _`managing a cluster with parallel ssh`: using-ray-on-a-large-cluster.html

Contributing to RLlib
---------------------

See the `RLlib Developer Guide <http://ray.readthedocs.io/en/latest/rllib-dev.html>`__.
See the `RLlib Developer Guide <rllib-dev.html>`__.
15 changes: 7 additions & 8 deletions python/ray/rllib/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,17 @@ This README provides a brief technical overview of RLlib. See also the `user doc

RLlib currently provides the following algorithms:

- `Proximal Policy Optimization <https://arxiv.org/abs/1707.06347>`__ which
- `Proximal Policy Optimization (PPO) <https://arxiv.org/abs/1707.06347>`__ which
is a proximal variant of `TRPO <https://arxiv.org/abs/1502.05477>`__.

- Evolution Strategies which is decribed in `this
paper <https://arxiv.org/abs/1703.03864>`__. Our implementation
borrows code from
`here <https://github.com/openai/evolution-strategies-starter>`__.
- `The Asynchronous Advantage Actor-Critic (A3C) <https://arxiv.org/abs/1602.01783>`__.

- `The Asynchronous Advantage Actor-Critic <https://arxiv.org/abs/1602.01783>`__
based on `the OpenAI starter agent <https://github.com/openai/universe-starter-agent>`__.
- `Deep Q Networks (DQN) <https://arxiv.org/abs/1312.5602>`__.

- `Deep Q Network (DQN) <https://arxiv.org/abs/1312.5602>`__.
- Evolution Strategies, as described in `this
paper <https://arxiv.org/abs/1703.03864>`__. Our implementation
is adapted from
`here <https://github.com/openai/evolution-strategies-starter>`__.

These algorithms can be run on any OpenAI Gym MDP, including custom ones written and registered by the user.

Expand Down