Skip to content

[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post#1708

Merged
ericl merged 23 commits intoray-project:masterfrom
ericl:cleanup-rllib-apis
Mar 15, 2018
Merged

[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post#1708
ericl merged 23 commits intoray-project:masterfrom
ericl:cleanup-rllib-apis

Conversation

@ericl
Copy link
Copy Markdown
Contributor

@ericl ericl commented Mar 13, 2018

No description provided.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4286/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4288/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4290/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4291/
Test PASSed.

Copy link
Copy Markdown
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks fine for a first pass

class WarpFrame(gym.ObservationWrapper):
def __init__(self, env):
def __init__(self, env, dim):
"""Warp frames to 84x84 as done in the Nature paper and later work."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change the docstring here? 84x84 is no longer the case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

def wrap_deepmind(env, random_starts=True, dim=80):
"""Configure environment for DeepMind-style Atari.

Note that we assume reward clipping is done outside the wrapper.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document the params?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

term will be used for sample prioritization."""

def _init(
self, learning_starts=1000, buffer_size=10000,
Copy link
Copy Markdown
Contributor

@richardliaw richardliaw Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these documented somewhere? hard to know what each thing does, especially if this is for usage outside rllib


- Another example porting a `TensorFlow DQN implementation <https://github.com/ericl/baselines/blob/rllib-example/baselines/deepq/dqn_evaluator.py>`__.

2. Pick a `Policy optimizer class <https://github.com/ray-project/ray/tree/master/python/ray/rllib/optimizers>`__. The `LocalSyncOptimizer <https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/local_sync.py>`__ is a reasonable choice for local testing. You can also implement your own. Policy optimizers can be constructed using their ``make`` method (e.g., ``LocalSyncOptimizer.make(evaluator_cls, evaluator_args, num_workers, conf)``), or you can construct them by passing in a list of evaluators instantiated as Ray actors.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that would provide clarity is conf -> optimizer_config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"td_error" array in the info return of compute_gradients(). This error
term will be used for sample prioritization."""

def _init(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is a little confusing and not very apparent in the docs is where all these parameters are being passed in. Reading this code, one would have to do some digging to jump through the various abstractions (ie, ApexAgent -^ DQNAgent -> ApexOptimizer -v PolicyOptimizer -^ ApexOptimizer to realize the chain of method calls needed to do this.

Providing some note in the documentation page, and also a small comment here would be good.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After all, this is essentially exposed to user.

-----------------

+-----------------------------+---------------------+-----------------+------------------------------+
| **Policy optimizer class** | **Operating range** | **Works with** | **Description** |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just built the docs locally, and this table is quite hard to read, especially with the need to horizontally scroll. Maybe just use sections, then add hyperlinks to relevant examples that actually use each optimizer..

@@ -0,0 +1,51 @@
Using Policy Optimizers outside RLlib
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider just renaming to Policy Optimizers

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


1. Implement the `Policy evaluator interface <rllib-dev.html#policy-evaluators-and-optimizers>`__.

- Here is an example of porting a `PyTorch Rainbow implementation <https://github.com/ericl/Rainbow/blob/rllib-example/rainbow_evaluator.py>`__.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit code examples here in this page would be good too

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4339/
Test PASSed.

@ericl ericl merged commit 882a649 into ray-project:master Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants