[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post#1708
[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post#1708ericl merged 23 commits intoray-project:masterfrom
Conversation
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
richardliaw
left a comment
There was a problem hiding this comment.
overall looks fine for a first pass
| class WarpFrame(gym.ObservationWrapper): | ||
| def __init__(self, env): | ||
| def __init__(self, env, dim): | ||
| """Warp frames to 84x84 as done in the Nature paper and later work.""" |
There was a problem hiding this comment.
can you change the docstring here? 84x84 is no longer the case.
| def wrap_deepmind(env, random_starts=True, dim=80): | ||
| """Configure environment for DeepMind-style Atari. | ||
|
|
||
| Note that we assume reward clipping is done outside the wrapper. |
There was a problem hiding this comment.
can you document the params?
| term will be used for sample prioritization.""" | ||
|
|
||
| def _init( | ||
| self, learning_starts=1000, buffer_size=10000, |
There was a problem hiding this comment.
are these documented somewhere? hard to know what each thing does, especially if this is for usage outside rllib
doc/source/rllib-optimizers.rst
Outdated
|
|
||
| - Another example porting a `TensorFlow DQN implementation <https://github.com/ericl/baselines/blob/rllib-example/baselines/deepq/dqn_evaluator.py>`__. | ||
|
|
||
| 2. Pick a `Policy optimizer class <https://github.com/ray-project/ray/tree/master/python/ray/rllib/optimizers>`__. The `LocalSyncOptimizer <https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/local_sync.py>`__ is a reasonable choice for local testing. You can also implement your own. Policy optimizers can be constructed using their ``make`` method (e.g., ``LocalSyncOptimizer.make(evaluator_cls, evaluator_args, num_workers, conf)``), or you can construct them by passing in a list of evaluators instantiated as Ray actors. |
There was a problem hiding this comment.
One thing that would provide clarity is conf -> optimizer_config.
| "td_error" array in the info return of compute_gradients(). This error | ||
| term will be used for sample prioritization.""" | ||
|
|
||
| def _init( |
There was a problem hiding this comment.
One thing that is a little confusing and not very apparent in the docs is where all these parameters are being passed in. Reading this code, one would have to do some digging to jump through the various abstractions (ie, ApexAgent -^ DQNAgent -> ApexOptimizer -v PolicyOptimizer -^ ApexOptimizer to realize the chain of method calls needed to do this.
Providing some note in the documentation page, and also a small comment here would be good.
There was a problem hiding this comment.
After all, this is essentially exposed to user.
| ----------------- | ||
|
|
||
| +-----------------------------+---------------------+-----------------+------------------------------+ | ||
| | **Policy optimizer class** | **Operating range** | **Works with** | **Description** | |
There was a problem hiding this comment.
I just built the docs locally, and this table is quite hard to read, especially with the need to horizontally scroll. Maybe just use sections, then add hyperlinks to relevant examples that actually use each optimizer..
doc/source/rllib-optimizers.rst
Outdated
| @@ -0,0 +1,51 @@ | |||
| Using Policy Optimizers outside RLlib | |||
There was a problem hiding this comment.
consider just renaming to Policy Optimizers
|
|
||
| 1. Implement the `Policy evaluator interface <rllib-dev.html#policy-evaluators-and-optimizers>`__. | ||
|
|
||
| - Here is an example of porting a `PyTorch Rainbow implementation <https://github.com/ericl/Rainbow/blob/rllib-example/rainbow_evaluator.py>`__. |
There was a problem hiding this comment.
explicit code examples here in this page would be good too
|
Test PASSed. |
No description provided.