[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post #1708

ericl · 2018-03-13T05:57:49Z

No description provided.

AmplabJenkins · 2018-03-13T07:00:16Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4286/
Test PASSed.

AmplabJenkins · 2018-03-13T07:04:03Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4288/
Test PASSed.

AmplabJenkins · 2018-03-13T07:22:16Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4290/
Test PASSed.

AmplabJenkins · 2018-03-13T07:30:08Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4291/
Test PASSed.

richardliaw

overall looks fine for a first pass

richardliaw · 2018-03-13T18:25:17Z

python/ray/rllib/utils/atari_wrappers.py

@@ -144,11 +144,11 @@ def reset(self, **kwargs):


 class WarpFrame(gym.ObservationWrapper):
-    def __init__(self, env):
+    def __init__(self, env, dim):
        """Warp frames to 84x84 as done in the Nature paper and later work."""


can you change the docstring here? 84x84 is no longer the case.

richardliaw · 2018-03-13T18:25:38Z

python/ray/rllib/utils/atari_wrappers.py

@@ -185,7 +185,7 @@ def _get_ob(self):
        return np.concatenate(self.frames, axis=2)


-def wrap_deepmind(env, random_starts):
+def wrap_deepmind(env, random_starts=True, dim=80):
    """Configure environment for DeepMind-style Atari.

    Note that we assume reward clipping is done outside the wrapper.


can you document the params?

richardliaw · 2018-03-13T18:28:26Z

python/ray/rllib/optimizers/apex_optimizer.py

+
+    This optimizer requires that policy evaluators return an additional
+    "td_error" array in the info return of compute_gradients(). This error
+    term will be used for sample prioritization."""

    def _init(
            self, learning_starts=1000, buffer_size=10000,


are these documented somewhere? hard to know what each thing does, especially if this is for usage outside rllib

richardliaw · 2018-03-13T19:01:13Z

doc/source/rllib-optimizers.rst

+
+    - Another example porting a `TensorFlow DQN implementation <https://github.com/ericl/baselines/blob/rllib-example/baselines/deepq/dqn_evaluator.py>`__.
+
+2. Pick a `Policy optimizer class <https://github.com/ray-project/ray/tree/master/python/ray/rllib/optimizers>`__. The `LocalSyncOptimizer <https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/local_sync.py>`__ is a reasonable choice for local testing. You can also implement your own. Policy optimizers can be constructed using their ``make`` method (e.g., ``LocalSyncOptimizer.make(evaluator_cls, evaluator_args, num_workers, conf)``), or you can construct them by passing in a list of evaluators instantiated as Ray actors.


One thing that would provide clarity is conf -> optimizer_config.

richardliaw · 2018-03-13T19:05:27Z

python/ray/rllib/optimizers/apex_optimizer.py

+
+    This optimizer requires that policy evaluators return an additional
+    "td_error" array in the info return of compute_gradients(). This error
+    term will be used for sample prioritization."""

    def _init(


One thing that is a little confusing and not very apparent in the docs is where all these parameters are being passed in. Reading this code, one would have to do some digging to jump through the various abstractions (ie, ApexAgent -^ DQNAgent -> ApexOptimizer -v PolicyOptimizer -^ ApexOptimizer to realize the chain of method calls needed to do this.

Providing some note in the documentation page, and also a small comment here would be good.

After all, this is essentially exposed to user.

richardliaw · 2018-03-13T19:11:30Z

doc/source/rllib-optimizers.rst

+-----------------
+
+-----------------------------+---------------------+-----------------+------------------------------+
+| **Policy optimizer class**  | **Operating range** | **Works with**  | **Description**              |


I just built the docs locally, and this table is quite hard to read, especially with the need to horizontally scroll. Maybe just use sections, then add hyperlinks to relevant examples that actually use each optimizer..

richardliaw · 2018-03-13T19:16:46Z

doc/source/rllib-optimizers.rst

@@ -0,0 +1,51 @@
+Using Policy Optimizers outside RLlib


consider just renaming to Policy Optimizers

richardliaw · 2018-03-13T19:17:05Z

doc/source/rllib-optimizers.rst

+
+1. Implement the `Policy evaluator interface <rllib-dev.html#policy-evaluators-and-optimizers>`__.
+
+    - Here is an example of porting a `PyTorch Rainbow implementation <https://github.com/ericl/Rainbow/blob/rllib-example/rainbow_evaluator.py>`__.


explicit code examples here in this page would be good too

…p-rllib-apis

AmplabJenkins · 2018-03-15T03:02:05Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4339/
Test PASSed.

ericl added 15 commits March 12, 2018 18:43

wip

6e24d3c

more work

c1f0e96

fix apex

c9649c1

docs

1e5542c

apex doc

6a3fdea

pool comment

0025561

clean up

9ca7b9b

make wrap stack pluggable

b6c7f9c

Mon Mar 12 21:45:50 PDT 2018

339456d

clean up comment

cdb6478

table

2501004

Mon Mar 12 22:51:57 PDT 2018

75a3cb1

Mon Mar 12 22:53:05 PDT 2018

2354df9

Mon Mar 12 22:55:03 PDT 2018

b50bc1e

Mon Mar 12 22:56:18 PDT 2018

c214110

ericl assigned richardliaw Mar 13, 2018

ericl added 5 commits March 12, 2018 22:59

Mon Mar 12 22:59:54 PDT 2018

e2f2a58

Update apex_optimizer.py

391dfa8

Update index.rst

f9614c8

Update README.rst

88f7e96

Update README.rst

4917bd6

richardliaw approved these changes Mar 13, 2018

View reviewed changes

ericl added 3 commits March 14, 2018 18:59

comments

86bf0c4

Merge branch 'cleanup-rllib-apis' of github.com:ericl/ray into cleanu…

2aab335

…p-rllib-apis

Wed Mar 14 19:01:02 PDT 2018

b1d1ca4

ericl merged commit 882a649 into ray-project:master Mar 15, 2018


		- Another example porting a `TensorFlow DQN implementation <https://github.com/ericl/baselines/blob/rllib-example/baselines/deepq/dqn_evaluator.py>`__.

		2. Pick a `Policy optimizer class <https://github.com/ray-project/ray/tree/master/python/ray/rllib/optimizers>`__. The `LocalSyncOptimizer <https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/local_sync.py>`__ is a reasonable choice for local testing. You can also implement your own. Policy optimizers can be constructed using their ``make`` method (e.g., ``LocalSyncOptimizer.make(evaluator_cls, evaluator_args, num_workers, conf)``), or you can construct them by passing in a list of evaluators instantiated as Ray actors.


		1. Implement the `Policy evaluator interface <rllib-dev.html#policy-evaluators-and-optimizers>`__.

		- Here is an example of porting a `PyTorch Rainbow implementation <https://github.com/ericl/Rainbow/blob/rllib-example/rainbow_evaluator.py>`__.

[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post #1708

[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post #1708

Uh oh!

Conversation

ericl commented Mar 13, 2018

Uh oh!

AmplabJenkins commented Mar 13, 2018

Uh oh!

AmplabJenkins commented Mar 13, 2018

Uh oh!

AmplabJenkins commented Mar 13, 2018

Uh oh!

AmplabJenkins commented Mar 13, 2018

Uh oh!

richardliaw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardliaw Mar 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Mar 15, 2018

Uh oh!

Uh oh!

richardliaw Mar 13, 2018 •

edited

Loading