-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post #1708
Changes from 20 commits
6e24d3c
c1f0e96
c9649c1
1e5542c
6a3fdea
0025561
9ca7b9b
b6c7f9c
339456d
cdb6478
2501004
75a3cb1
2354df9
b50bc1e
c214110
e2f2a58
391dfa8
f9614c8
88f7e96
4917bd6
86bf0c4
2aab335
b1d1ca4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
Using Policy Optimizers outside RLlib | ||
===================================== | ||
|
||
RLlib supports using its distributed policy optimizer implementations from external algorithms. | ||
|
||
Here are the steps for using a RLlib policy optimizer with an existing algorithm. | ||
|
||
1. Implement the `Policy evaluator interface <rllib-dev.html#policy-evaluators-and-optimizers>`__. | ||
|
||
- Here is an example of porting a `PyTorch Rainbow implementation <https://github.com/ericl/Rainbow/blob/rllib-example/rainbow_evaluator.py>`__. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. explicit code examples here in this page would be good too |
||
|
||
- Another example porting a `TensorFlow DQN implementation <https://github.com/ericl/baselines/blob/rllib-example/baselines/deepq/dqn_evaluator.py>`__. | ||
|
||
2. Pick a `Policy optimizer class <https://github.com/ray-project/ray/tree/master/python/ray/rllib/optimizers>`__. The `LocalSyncOptimizer <https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/local_sync.py>`__ is a reasonable choice for local testing. You can also implement your own. Policy optimizers can be constructed using their ``make`` method (e.g., ``LocalSyncOptimizer.make(evaluator_cls, evaluator_args, num_workers, conf)``), or you can construct them by passing in a list of evaluators instantiated as Ray actors. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One thing that would provide clarity is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
||
- Here is code showing the `simple Policy Gradient agent <https://github.com/ray-project/ray/blob/master/python/ray/rllib/pg/pg.py>`__ using ``make()``. | ||
|
||
- A different example showing an `A3C agent <https://github.com/ray-project/ray/blob/master/python/ray/rllib/a3c/a3c.py>`__ passing in Ray actors directly. | ||
|
||
3. Decide how you want to drive the training loop. | ||
|
||
- Option 1: call ``optimizer.step()`` from some existing training code. Training statistics can be retrieved by querying the ``optimizer.local_evaluator`` evaluator instance, or mapping over the remote evaluators (e.g., ``ray.get([ev.some_fn.remote() for ev in optimizer.remote_evaluators])``) if you are running with multiple workers. | ||
|
||
- Option 2: define a full RLlib `Agent class <https://github.com/ray-project/ray/blob/master/python/ray/rllib/agent.py>`__. This might be preferable if you don't have an existing training harness or want to use features provided by `Ray Tune <tune.html>`__. | ||
|
||
|
||
Policy Optimizers | ||
----------------- | ||
|
||
+-----------------------------+---------------------+-----------------+------------------------------+ | ||
| **Policy optimizer class** | **Operating range** | **Works with** | **Description** | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just built the docs locally, and this table is quite hard to read, especially with the need to horizontally scroll. Maybe just use sections, then add hyperlinks to relevant examples that actually use each optimizer.. |
||
+=============================+=====================+=================+==============================+ | ||
|AsyncOptimizer |1-10s of CPUs |(any) |Asynchronous gradient-based | | ||
| | | |optimization (e.g., A3C) | | ||
+-----------------------------+---------------------+-----------------+------------------------------+ | ||
|LocalSyncOptimizer |0-1 GPUs + |(any) |Synchronous gradient-based | | ||
| |1-100s of CPUs | |optimization with parallel | | ||
| | | |sample collection | | ||
+-----------------------------+---------------------+-----------------+------------------------------+ | ||
|LocalSyncReplayOptimizer |0-1 GPUs + | Off-policy |Adds a replay buffer | | ||
| |1-100s of CPUs | algorithms |to LocalSyncOptimizer | | ||
+-----------------------------+---------------------+-----------------+------------------------------+ | ||
|LocalMultiGPUOptimizer |0-10 GPUs + | Algorithms |Implements data-parallel | | ||
| |1-100s of CPUs | written in |optimization over multiple | | ||
| | | TensorFlow |GPUs, e.g., for PPO | | ||
+-----------------------------+---------------------+-----------------+------------------------------+ | ||
|ApexOptimizer |1 GPU + | Off-policy |Implements the Ape-X | | ||
| |10-100s of CPUs | algorithms |distributed prioritization | | ||
| | | w/sample |algorithm | | ||
| | | prioritization | | | ||
+-----------------------------+---------------------+-----------------+------------------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider just renaming to Policy Optimizers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done