Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] Pull out multi-gpu optimizer as a generic class #1313

Merged
merged 43 commits into from
Dec 17, 2017

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Dec 12, 2017

What do these changes do?

First cut at pulling out the multi-gpu optimizer, by adding it for DQN. New classes:

  • LocalMultiGPUOptimizer
  • SampleBatch

Pong results with 42x42 and multi_gpu on and off. Perf with multi_gpu is a bit lower, but that's to be expected since we have to do a separate pass for prioritization.

 - DQN_PongDeterministic-v4_0_multi_gpu=True:   TERMINATED [pid=63073], 9824 s, 1270000 ts, 20 rew
 - DQN_PongDeterministic-v4_2_multi_gpu=True:   TERMINATED [pid=63462], 27332 s, 3741000 ts, 20.1 rew
 - DQN_PongDeterministic-v4_4_multi_gpu=True:   TERMINATED [pid=63475], 15352 s, 2016000 ts, 20 rew
 - DQN_PongDeterministic-v4_6_multi_gpu=True:   TERMINATED [pid=63492], 13921 s, 1815000 ts, 20 rew
 - DQN_PongDeterministic-v4_1_multi_gpu=False:  TERMINATED [pid=63211], 7237 s, 1021000 ts, 20 rew
 - DQN_PongDeterministic-v4_3_multi_gpu=False:  TERMINATED [pid=63464], 13237 s, 1909000 ts, 20 rew
 - DQN_PongDeterministic-v4_5_multi_gpu=False:  TERMINATED [pid=63485], 6648 s, 940000 ts, 20.1 rew
 - DQN_PongDeterministic-v4_7_multi_gpu=False:  TERMINATED [pid=63511], 8436 s, 1204000 ts, 20 rew

Related issue number

#1291

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2778/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2783/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2780/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2781/
Test PASSed.

@ericl ericl changed the title [rllib] [WIP] Pull out multi-gpu optimizer as a generic class [rllib] Pull out multi-gpu optimizer as a generic class Dec 15, 2017
@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2798/
Test FAILed.

Copy link
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments; this introduces some nice abstractions.

def concat(self, other):
"""Returns a new SampleBatch with each data column concatenated.

>>> b1 = SampleBatch({"a": [1, 2]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you label this with Examples:?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

"""Wrapper around a dictionary with string keys and array-like values.

For example, {"obs": [1, 2, 3], "reward": [0, -1, 1]} is a batch of three
samples, each with an "obs" and "reward" attribute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Shuffle also be a method here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm possibly. Right now multi gpu optimizer does a more efficient shuffling where it only permutes indices, which seems just as good.

@@ -55,20 +55,23 @@ def update_target(self):
self.dqn_graph.update_target(self.sess)

def sample(self):
output = []
obs, actions, rewards, new_obs, dones = [], [], [], [], []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to replace this with a SyncSampler? A little modification would need to be made to support exploration, but this class will end up cleaner. Can be done in another PR I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think it should work.

import numpy as np


class SampleBatch(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a nice abstraction - perhaps it makes sense to extend PartialRollout from this which provides the ability to support truncated rollouts. If so, we can do it in a separate PR.

for row in s.rows():
self.replay_buffer.add(
row["obs"], row["actions"], row["rewards"], row["new_obs"],
row["dones"])

if no_replay:
return samples

# Then return a batch sampled from the buffer
if self.config["prioritized_replay"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope later on all the ReplayBuffer casework can get pushed into the ReplayBuffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult. The prioritization requires extra computation that has to be part of the loss.

self._update_priorities_if_needed()
self.samples_to_prioritize = (
obses_t, actions, rewards, obses_tp1, dones, batch_idxes)
batch = SampleBatch({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just have the replay_buffer return a SampleBatch right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm less concerned about that for now -- I'd rather keep replay_buffer unchanged as long as possible since it's still basically similar to the baselines code.

[ph for _, ph in self.loss_inputs],
self.per_device_batch_size,
lambda *ph: self.local_evaluator.build_tf_loss(ph),
self.config.get("logdir", "/tmp/ray"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this directory? didn't logging directory change per #1297

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to os.getcwd()

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2808/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2809/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2814/
Test FAILed.

@richardliaw
Copy link
Contributor

richardliaw commented Dec 17, 2017

Jenkins is failing, but otherwise changes are fine

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2826/
Test PASSed.

@richardliaw richardliaw merged commit 47b1f02 into ray-project:master Dec 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants