Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] PPO and A3C unification #1253

Merged
merged 48 commits into from
Dec 14, 2017

Conversation

richardliaw
Copy link
Contributor

@richardliaw richardliaw commented Nov 25, 2017

What do these changes do?

Variety of changes are introduced:

  • This changes value regression to fit to advantages + vf_preds instead of MC returns.
  • PartialRollouts squeezes everything except for observations and features
  • Removes BatchedEnv dependency from PPO.
  • Filter support for samplers (both synchronous and async)

TODOS:

  • Test PPO works

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2553/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2565/
Test FAILed.

@richardliaw richardliaw changed the title [rllib][wip] Removing BatchedEnv from PPO [rllib] PPO and A3C unification Dec 12, 2017
@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2739/
Test FAILed.

@richardliaw richardliaw requested a review from ericl December 12, 2017 09:11
@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2741/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2745/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2758/
Test PASSed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring. To make sure PPO performance hasn't regressed, can you run the tuned humanoid example?

@@ -105,6 +105,7 @@ def _fetch_metrics_from_workers(self):
return result

def _save(self):
# TODO(rliaw): extend to also support saving worker state?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's probably required for advanced hypertune algorithms to work well.

@@ -118,6 +119,8 @@ def _restore(self, checkpoint_path):
self.rew_filter = objects[2]
self.policy.set_weights(self.parameters)

# TODO(rliaw): augment to support LSTM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a general TODO on agents.



class Runner(object):
class Runner(Evaluator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A3CEvaluator(Evaluator)


def value(self, ob, *args):
vf = self.sess.run(self.vf, {self.x: [ob]})
return vf[0]

def get_initial_features(self):
# TODO(rliaw): make sure this is right
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything is fine since this isn't lstm right? So could return None.

dummy],
full_trace=full_trace)
use_gae = self.config["use_gae"]
dummy = np.zeros((trajectories["observations"].shape[0],))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.zeros_like?

self.config["horizon"], self.config["horizon"])
if not is_remote:
# local model needs obs_filter for compute
self.obs_filter = obs_filter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just use the obs filter in the sampler?

Copy link
Contributor Author

@richardliaw richardliaw Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any preference on keeping the (global/master) observation_filter in the model or can I move it into ppo.py? (similarly to A3C)

@@ -0,0 +1,40 @@
from __future__ import absolute_import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should be called process_rollout or something

@@ -38,6 +43,9 @@ def __init__(self, extra_fields=None):

def add(self, **kwargs):
for k, v in kwargs.items():
if (k not in ["observations", "features"]
and hasattr(v, "squeeze")):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of fishy, why is it needed?

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2761/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2762/
Test FAILed.

@richardliaw
Copy link
Contributor Author

screenshot 2017-12-13 16 04 35

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2776/
Test PASSed.

@ericl ericl self-assigned this Dec 14, 2017
@richardliaw richardliaw merged commit c5c83a4 into ray-project:master Dec 14, 2017
@richardliaw
Copy link
Contributor Author

screenshot 2017-12-14 23 49 20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants