[rllib] PPO and A3C unification #1253

richardliaw · 2017-11-25T10:12:25Z

What do these changes do?

Variety of changes are introduced:

This changes value regression to fit to advantages + vf_preds instead of MC returns.
PartialRollouts squeezes everything except for observations and features
Removes BatchedEnv dependency from PPO.
Filter support for samplers (both synchronous and async)

TODOS:

Test PPO works

AmplabJenkins · 2017-11-25T10:34:34Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-11-25T10:34:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2553/
Test FAILed.

…ralize_a3c

AmplabJenkins · 2017-11-26T06:56:09Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-11-26T06:56:09Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2565/
Test FAILed.

AmplabJenkins · 2017-12-12T08:16:28Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-12-12T08:16:28Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2739/
Test FAILed.

AmplabJenkins · 2017-12-12T09:38:40Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-12-12T09:38:40Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2741/
Test FAILed.

AmplabJenkins · 2017-12-12T21:46:37Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-12-12T21:46:38Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2745/
Test PASSed.

AmplabJenkins · 2017-12-13T03:36:26Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-12-13T03:36:26Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2758/
Test PASSed.

ericl

Nice refactoring. To make sure PPO performance hasn't regressed, can you run the tuned humanoid example?

ericl · 2017-12-13T06:37:37Z

python/ray/rllib/a3c/a3c.py

@@ -105,6 +105,7 @@ def _fetch_metrics_from_workers(self):
        return result

    def _save(self):
+        # TODO(rliaw): extend to also support saving worker state?


Yeah that's probably required for advanced hypertune algorithms to work well.

ericl · 2017-12-13T06:37:52Z

python/ray/rllib/a3c/a3c.py

@@ -118,6 +119,8 @@ def _restore(self, checkpoint_path):
        self.rew_filter = objects[2]
        self.policy.set_weights(self.parameters)

+    # TODO(rliaw): augment to support LSTM


This could be a general TODO on agents.

ericl · 2017-12-13T06:38:30Z

python/ray/rllib/a3c/runner.py



-class Runner(object):
+class Runner(Evaluator):


A3CEvaluator(Evaluator)

ericl · 2017-12-13T06:42:02Z

python/ray/rllib/a3c/shared_model.py


    def value(self, ob, *args):
        vf = self.sess.run(self.vf, {self.x: [ob]})
        return vf[0]

    def get_initial_features(self):
+        # TODO(rliaw): make sure this is right


Anything is fine since this isn't lstm right? So could return None.

ericl · 2017-12-13T06:45:18Z

python/ray/rllib/ppo/runner.py

-                 dummy],
-                full_trace=full_trace)
+        use_gae = self.config["use_gae"]
+        dummy = np.zeros((trajectories["observations"].shape[0],))


np.zeros_like?

ericl · 2017-12-13T06:46:19Z

python/ray/rllib/ppo/runner.py

+            self.config["horizon"], self.config["horizon"])
+        if not is_remote:
+            # local model needs obs_filter for compute
+            self.obs_filter = obs_filter


Should we just use the obs filter in the sampler?

Any preference on keeping the (global/master) observation_filter in the model or can I move it into ppo.py? (similarly to A3C)

ericl · 2017-12-13T06:47:16Z

python/ray/rllib/utils/common.py

@@ -0,0 +1,40 @@
+from __future__ import absolute_import


this file should be called process_rollout or something

ericl · 2017-12-13T06:47:50Z

python/ray/rllib/utils/sampler.py

@@ -38,6 +43,9 @@ def __init__(self, extra_fields=None):

    def add(self, **kwargs):
        for k, v in kwargs.items():
+            if (k not in ["observations", "features"]
+                    and hasattr(v, "squeeze")):


This is kind of fishy, why is it needed?

AmplabJenkins · 2017-12-13T08:59:31Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-12-13T08:59:31Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2761/
Test FAILed.

AmplabJenkins · 2017-12-13T11:19:58Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-12-13T11:19:58Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2762/
Test FAILed.

…factor

richardliaw · 2017-12-14T00:06:50Z

AmplabJenkins · 2017-12-14T00:46:13Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-12-14T00:46:14Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2776/
Test PASSed.

richardliaw · 2017-12-15T08:04:19Z

richardliaw added 10 commits November 23, 2017 11:10

Fixing up rollout.py in PPO

b8f3224

keeping this

5d57e43

more changes

219eda3

WIP

dec9fb5

working

9005adc

lint

c8cd016

docs

355e0e2

more docs

a95f86e

small

90fc58e

support reward filters

b1e26d1

richardliaw added 5 commits November 25, 2017 10:34

fix docs

fd0fcb3

Merge branch 'generalize_a3c' into ppo_refactor

fad1d10

move filter out

d091fc2

Merge branch 'generalize_a3c' of github.com:richardliaw/ray into gene…

0d9b397

…ralize_a3c

revert back with proper reward filter

4a19ae0

richardliaw mentioned this pull request Nov 26, 2017

[rllib] Generalizing A3C Sampling Classes #1250

Merged

3 tasks

mrege

943d002

richardliaw added 9 commits November 25, 2017 22:57

fix

51701d2

Merge branch 'generalize_a3c' into ppo_refactor

7740511

small

4dcbf6a

moving completed rollout

c78ef29

move sampler out

659ce24

Merge branch 'master' into generalize_a3c

814b2b8

Merge branch 'master' into generalize_a3c

54b85fe

Update runner.py

f5eb5c7

wip

3fa910b

richardliaw changed the title ~~[rllib][wip] Removing BatchedEnv from PPO~~ [rllib] PPO and A3C unification Dec 12, 2017

evaluator

e0ecd06

richardliaw requested a review from ericl December 12, 2017 09:11

fix issue with compute

e1a58d3

lint

1f1bc17

ericl reviewed Dec 13, 2017

View reviewed changes

richardliaw added 2 commits December 13, 2017 00:01

addressing comments

878920d

Removed extra checks

ac4c433

lint

efe7522

richardliaw added 2 commits December 13, 2017 11:37

small typo

54d219c

Merge branch 'ppo_refactor' of github.com:richardliaw/ray into ppo_re…

e0131e6

…factor

ericl approved these changes Dec 14, 2017

View reviewed changes

ericl self-assigned this Dec 14, 2017

richardliaw merged commit c5c83a4 into ray-project:master Dec 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] PPO and A3C unification #1253

[rllib] PPO and A3C unification #1253

richardliaw commented Nov 25, 2017 •

edited

Loading

AmplabJenkins commented Nov 25, 2017

AmplabJenkins commented Nov 25, 2017

AmplabJenkins commented Nov 26, 2017

AmplabJenkins commented Nov 26, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 13, 2017

AmplabJenkins commented Dec 13, 2017

ericl left a comment

ericl Dec 13, 2017

ericl Dec 13, 2017

ericl Dec 13, 2017

ericl Dec 13, 2017

ericl Dec 13, 2017

ericl Dec 13, 2017

richardliaw Dec 13, 2017 •

edited

Loading

ericl Dec 13, 2017

ericl Dec 13, 2017

AmplabJenkins commented Dec 13, 2017

AmplabJenkins commented Dec 13, 2017

AmplabJenkins commented Dec 13, 2017

AmplabJenkins commented Dec 13, 2017

richardliaw commented Dec 14, 2017

AmplabJenkins commented Dec 14, 2017

AmplabJenkins commented Dec 14, 2017

richardliaw commented Dec 15, 2017



		class Runner(object):
		class Runner(Evaluator):

[rllib] PPO and A3C unification #1253

[rllib] PPO and A3C unification #1253

Conversation

richardliaw commented Nov 25, 2017 • edited Loading

What do these changes do?

TODOS:

AmplabJenkins commented Nov 25, 2017

AmplabJenkins commented Nov 25, 2017

AmplabJenkins commented Nov 26, 2017

AmplabJenkins commented Nov 26, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 12, 2017

AmplabJenkins commented Dec 13, 2017

AmplabJenkins commented Dec 13, 2017

ericl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardliaw Dec 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Dec 13, 2017

AmplabJenkins commented Dec 13, 2017

AmplabJenkins commented Dec 13, 2017

AmplabJenkins commented Dec 13, 2017

richardliaw commented Dec 14, 2017

AmplabJenkins commented Dec 14, 2017

AmplabJenkins commented Dec 14, 2017

richardliaw commented Dec 15, 2017

richardliaw commented Nov 25, 2017 •

edited

Loading

richardliaw Dec 13, 2017 •

edited

Loading