Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post #1708

Merged
merged 23 commits into from
Mar 15, 2018

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Mar 13, 2018

No description provided.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4286/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4288/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4290/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4291/
Test PASSed.

Copy link
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks fine for a first pass

@@ -144,11 +144,11 @@ def reset(self, **kwargs):


class WarpFrame(gym.ObservationWrapper):
def __init__(self, env):
def __init__(self, env, dim):
"""Warp frames to 84x84 as done in the Nature paper and later work."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change the docstring here? 84x84 is no longer the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -185,7 +185,7 @@ def _get_ob(self):
return np.concatenate(self.frames, axis=2)


def wrap_deepmind(env, random_starts):
def wrap_deepmind(env, random_starts=True, dim=80):
"""Configure environment for DeepMind-style Atari.

Note that we assume reward clipping is done outside the wrapper.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document the params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


This optimizer requires that policy evaluators return an additional
"td_error" array in the info return of compute_gradients(). This error
term will be used for sample prioritization."""

def _init(
self, learning_starts=1000, buffer_size=10000,
Copy link
Contributor

@richardliaw richardliaw Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these documented somewhere? hard to know what each thing does, especially if this is for usage outside rllib


- Another example porting a `TensorFlow DQN implementation <https://github.com/ericl/baselines/blob/rllib-example/baselines/deepq/dqn_evaluator.py>`__.

2. Pick a `Policy optimizer class <https://github.com/ray-project/ray/tree/master/python/ray/rllib/optimizers>`__. The `LocalSyncOptimizer <https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/local_sync.py>`__ is a reasonable choice for local testing. You can also implement your own. Policy optimizers can be constructed using their ``make`` method (e.g., ``LocalSyncOptimizer.make(evaluator_cls, evaluator_args, num_workers, conf)``), or you can construct them by passing in a list of evaluators instantiated as Ray actors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that would provide clarity is conf -> optimizer_config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


This optimizer requires that policy evaluators return an additional
"td_error" array in the info return of compute_gradients(). This error
term will be used for sample prioritization."""

def _init(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is a little confusing and not very apparent in the docs is where all these parameters are being passed in. Reading this code, one would have to do some digging to jump through the various abstractions (ie, ApexAgent -^ DQNAgent -> ApexOptimizer -v PolicyOptimizer -^ ApexOptimizer to realize the chain of method calls needed to do this.

Providing some note in the documentation page, and also a small comment here would be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After all, this is essentially exposed to user.

-----------------

+-----------------------------+---------------------+-----------------+------------------------------+
| **Policy optimizer class** | **Operating range** | **Works with** | **Description** |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just built the docs locally, and this table is quite hard to read, especially with the need to horizontally scroll. Maybe just use sections, then add hyperlinks to relevant examples that actually use each optimizer..

@@ -0,0 +1,51 @@
Using Policy Optimizers outside RLlib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider just renaming to Policy Optimizers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


1. Implement the `Policy evaluator interface <rllib-dev.html#policy-evaluators-and-optimizers>`__.

- Here is an example of porting a `PyTorch Rainbow implementation <https://github.com/ericl/Rainbow/blob/rllib-example/rainbow_evaluator.py>`__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit code examples here in this page would be good too

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4339/
Test PASSed.

@ericl ericl merged commit 882a649 into ray-project:master Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants