-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Refactor Multi-GPU for PPO #1646
Conversation
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Looks like this needs to be rebased (also some PAL stuff leaked in). |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
@@ -0,0 +1,149 @@ | |||
from __future__ import absolute_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should get rid of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file? Or the line from __future__ import absolute_import
? We include that line at the top of every Python file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah this comment is for the file
python/ray/rllib/dqn/models.py
Outdated
@@ -80,15 +80,13 @@ def _scope_vars(scope, trainable_only=False): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should revert all the changes in this file
python/ray/rllib/models/catalog.py
Outdated
@@ -135,7 +135,6 @@ def get_model(registry, inputs, num_outputs, options=dict()): | |||
Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should revert all the changes in this file
python/ray/rllib/ppo/rollout.py
Outdated
@@ -1,34 +1,3 @@ | |||
from __future__ import absolute_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should just delete this file
python/ray/rllib/dqn/dqn.py
Outdated
@@ -169,7 +169,6 @@ def _train(self): | |||
self.local_evaluator.set_global_timestep(self.global_timestep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and revert changes here
@@ -44,6 +44,10 @@ def process_rollout(rollout, reward_filter, gamma, lambda_=1.0, use_gae=True): | |||
rollout.data["rewards"] + [np.array(rollout.last_r)]).squeeze() | |||
traj["advantages"] = discount(rewards_plus_v, gamma)[:-1] | |||
|
|||
# TODO(rliaw): this is a hack to get multigpu running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@victorsun123 can you fix so that this isn't a hack?
python/ray/rllib/utils/__init__.py
Outdated
@@ -1,3 +1,5 @@ | |||
from ray.rllib.utils.filter_manager import FilterManager | |||
#from ray.rllib.utils.sampler import collect_samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the comment?
Test FAILed. |
Test FAILed. |
@victorsun123 is this still in progress? |
I have to clean it up - yeah
…On Fri, May 25, 2018 at 10:11 PM Robert Nishihara ***@***.***> wrote:
@victorsun123 <https://github.com/victorsun123> is this still in progress?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1646 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEUc5QPROhllY4JYgM2QLQRSysrN_Fj-ks5t2OP0gaJpZM4SbJWp>
.
|
Test FAILed. |
Test FAILed. |
Test FAILed. |
GPU ids are not detected |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
@@ -0,0 +1,13 @@ | |||
from __future__ import absolute_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should support seeding (this utility was incredibly useful during debugging). If so we also need to support environment seeding, which is not covered in this utility.
# [e.sample.remote() for e in self.remote_evaluators])) | ||
from ray.rllib.ppo.rollout import collect_samples | ||
samples = collect_samples(self.remote_evaluators, | ||
self.timesteps_per_batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's probably a better treatment for this?
self.kl_coeff, self.distribution_class, self.config, | ||
self.sess, self.registry) | ||
|
||
def init_extra_ops(self, device_losses): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was odd because of the chain of dependencies:
- Local evaluator creates a model
- MultiGPU creates Variable replicas (which are just refs to the local model)
- These ops are created, which use nodes from (2)
@ericl this is ready for readthrough |
Test FAILed. |
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. I'm not too worried about the ugly bits since we'll need to port this to CommonPolicyEvaluator
next anyways....
Test FAILed. |
Test PASSed. |
Test PASSed. |
* 'master' of https://github.com/ray-project/ray: (157 commits) Fix build failure while using make -j1. Issue 2257 (ray-project#2279) Cast locator with index type (ray-project#2274) fixing zero length partitions (ray-project#2237) Make actor handles work in Python mode. (ray-project#2283) [xray] Add error table and push error messages to driver through node manager. (ray-project#2256) addressing comments (ray-project#2210) Re-enable some actor tests. (ray-project#2276) Experimental: enable automatic GCS flushing with configurable policy. (ray-project#2266) [xray] Sets good object manager defaults. (ray-project#2255) [tune] Update Trainable doc to expose interface (ray-project#2272) [rllib] Add a simple REST policy server and client example (ray-project#2232) [asv] Pushing to s3 (ray-project#2246) [rllib] Remove need to pass around registry (ray-project#2250) Support multiple availability zones in AWS (fix ray-project#2177) (ray-project#2254) [rllib] Add squash_to_range model option (ray-project#2239) Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. (ray-project#2271) [rllib] Refactor Multi-GPU for PPO (ray-project#1646) [rllib] Envs for vectorized execution, async execution, and policy serving (ray-project#2170) [Dataframe] Change pandas and ray.dataframe imports (ray-project#1942) [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) (ray-project#2245) ...
What do these changes do?
Refactor Multi-GPU support for PPO algorithm
Main notes:
The each evaluator, including the local evaluator has a copy of the graph.
The MultiGPU Optimizer has also a copy of the graph, with
N
gradient op copies, where N is the number of devices exposed.Before and after each optimizer step, the many copies of the graph are synchronized.
TODOS:
python /ray/python/ray/rllib/train.py --env CartPole-v1 --run PPO --stop '{"training_iteration": 2}' --config '{"kl_coeff": 1.0, "num_sgd_iter": 10, "sgd_stepsize": 1e-4, "sgd_batchsize": 64, "timesteps_per_batch": 2000, "num_workers": 1, "use_gae": false}'
is failingComments/cleanup
YAPF + Lint
With matching weights and dataset, the graph outputs the same loss and KL and entropy. However, the resulting weights are different after 1 SGD step (and with changing the optimizer with SGD instead of Adam).