[rllib] Support torch device and distributions. #4553

cffan · 2019-04-03T17:19:53Z

What do these changes do?

Move torch model and inputs to GPU if specified.
Wrapper class for torch native distributions.
Support "grad_clip" in A3C.
Return grad_info in A3C.
Fix A3C and PG log_prob calculated not using actions.

Related issue number

Closes #4333

Linter

I've run scripts/format.sh to lint the changes in this PR.

AmplabJenkins · 2019-04-03T17:23:28Z

Can one of the admins verify this patch?

AmplabJenkins · 2019-04-03T17:53:36Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13475/
Test FAILed.

cffan · 2019-04-03T17:57:41Z

The test environment is using pytorch-cpu. Should I change it to pytorch?

AmplabJenkins · 2019-04-03T18:17:41Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13476/
Test FAILed.

ericl · 2019-04-03T20:28:03Z

Hm, does that make a gpu available? AFAIK, none of our tests are currently run with GPUs. What is the limitation of pytorch-cpu?

AmplabJenkins · 2019-04-04T01:25:11Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13484/
Test FAILed.

cffan · 2019-04-04T03:19:22Z

Never mind. CPU case should be handled correctly now.

ericl · 2019-04-04T07:41:25Z

python/ray/rllib/agents/a3c/a3c_torch_policy_graph.py

+        logits, _, values, _ = policy_model(
+            {SampleBatch.CUR_OBS: observations}, [])
+        logits = logits
+        values = values


These two lines seem redundant?

ericl · 2019-04-04T07:43:12Z

python/ray/rllib/agents/a3c/a3c_torch_policy_graph.py

+            log_probs = log_probs.sum(-1)
+        self.entropy = dist.entropy().mean().cpu()
+        self.pi_err = -advantages.dot(log_probs.reshape(-1)).cpu()
+        self.value_err = F.mse_loss(values.reshape(-1), value_targets).cpu()


Is it necessary to move the loss to cpu?

ericl · 2019-04-04T07:44:01Z

python/ray/rllib/agents/a3c/a3c_torch_policy_graph.py

+            action_distribution_cls=dist_class)
+
+    @override(PolicyGraph)
+    def compute_gradients(self, postprocessed_batch):


Could we keep this method impl in TorchPolicyGraph and have options to clip grads / return extra stats as generic functionality?

I can make an abstract method for getting extra grad info.
For grad clipping, I can either make config a property of TorchPolicyGraph so compute_gradients() in TorchPolicyGraph would know whether to clip grad or make an abstract method extra_grad_processing(self, grad) in TorchPolicyGraph and let subclass process the grad. What's your preference?

TFPolicyGraph offers the extra grad processing method, so it's probably better to do that for consistency.

ericl · 2019-04-04T07:45:55Z

python/ray/rllib/agents/pg/torch_pg_policy_graph.py

+        dist = self.dist_class(logits)
+        log_probs = dist.logp(actions)
+        if len(log_probs.shape) > 1:
+            log_probs = log_probs.sum(-1)


In which cases does log_probs have a nontrivial second dimension? Wondering if the reshape() is sufficient?

Same question for A3CLoss.

I haven't tried others but Normal distribution's log_prob returns vector of shape (n,) where n is the number of gaussians. I can absorb this into TorchDiagGaussian.

ericl · 2019-04-04T07:47:18Z

python/ray/rllib/evaluation/torch_policy_graph.py

        """
        self.observation_space = observation_space
        self.action_space = action_space
        self.lock = Lock()
-        self._model = model
+        cuda_devices = os.environ['CUDA_VISIBLE_DEVICES'].split(',')


could simply check if bool(os.environ.get("CUDA_VISIBLE_DEVICES"))

ericl · 2019-04-04T07:48:39Z

python/ray/rllib/models/action_dist.py

@@ -285,3 +286,40 @@ def kl(self, other):
    @override(ActionDistribution)
    def _build_sample_op(self):
        return self.dist.sample()
+
+
+class TorchDistributionWrapper(ActionDistribution):


Could the torch classes for action dist be placed in a separate file?

ericl · 2019-04-04T07:49:07Z

python/ray/rllib/models/action_dist.py

@@ -5,6 +5,7 @@
 from collections import namedtuple
 import distutils.version
 import tensorflow as tf
+import torch


Let's make sure to not import torch unless we hit a torch=true code path, to avoid acquiring a hard dependency on torch.

ericl · 2019-04-04T07:51:20Z

python/ray/rllib/models/catalog.py

@@ -120,7 +121,8 @@ def get_action_dist(action_space, config, dist_type=None):
            elif dist_type == "deterministic":
                return Deterministic, action_space.shape[0]
        elif isinstance(action_space, gym.spaces.Discrete):
-            return Categorical, action_space.n
+            dist = TorchCategorical if torch else Categorical


could we add if torch: raise NotImplementedError for the other dist types?

ericl · 2019-04-04T07:52:18Z

Thanks for opening this! Overall looks solid; have some comments.

ericl · 2019-04-04T07:54:12Z

python/ray/rllib/evaluation/torch_policy_graph.py

                model_out = self._model({"obs": ob}, state_batches)
                logits, _, vf, state = model_out
-                actions = F.softmax(logits, dim=1).multinomial(1).squeeze(0)
-                return (actions.numpy(), [h.numpy() for h in state],
+                action_dist = self._action_dist_cls(logits)


Now that A2C/PG presumably work with continuous action spaces, you can add two entries to run_rllib_tests.sh to check they work on Pendulum-v0:

Similar to the CartPole-v0 entries:
https://github.com/ray-project/ray/blob/master/ci/jenkins_tests/run_rllib_tests.sh#L407

AmplabJenkins · 2019-04-11T05:45:03Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13720/
Test FAILed.

AmplabJenkins · 2019-04-11T06:05:32Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13721/
Test FAILed.

AmplabJenkins · 2019-04-11T08:26:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13723/
Test FAILed.

ericl

LGTM. One thing I'm wondering is if it's possible to test that GPU mode works properly, without a real GPU. It seems easy to forget a cpu().

cffan · 2019-04-11T18:43:48Z

Is it possible to spin up a gpu instance every night and run a nightly test build on it to catch some errors?

ericl · 2019-04-11T21:58:15Z

Hm, potentially. I'm not sure if travis supports GPU instances though.

@FlyClover tests look good, but you have a couple lint changes: https://travis-ci.com/ray-project/ray/jobs/192067752

AmplabJenkins · 2019-04-12T04:22:49Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13748/
Test FAILed.

AmplabJenkins · 2019-04-12T05:47:37Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13755/
Test FAILed.

ericl · 2019-04-12T18:39:19Z

Merged, thanks!

Support torch device and distributions.

fcc747a

Update learning stats key

b0c623e

ericl self-assigned this Apr 3, 2019

ericl changed the title ~~Support torch device and distributions.~~ [rllib] Support torch device and distributions. Apr 3, 2019

Fix cuda device not set properply

05300ce

ericl reviewed Apr 4, 2019

View reviewed changes

cffan added 2 commits April 10, 2019 21:54

Fix for pull request

dd00bb3

Fix format

667482f

Fix CUDA_VISIBLE_DEVICES not set

b74f369

ericl approved these changes Apr 11, 2019

View reviewed changes

cffan added 2 commits April 11, 2019 17:43

Fix for lint

2d040c6

Fix quote

c4e950c

ericl merged commit bb207a2 into ray-project:master Apr 12, 2019

[rllib] Support torch device and distributions. #4553

[rllib] Support torch device and distributions. #4553

Uh oh!

Conversation

cffan commented Apr 3, 2019 • edited by ericl Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do?

Related issue number

Linter

Uh oh!

AmplabJenkins commented Apr 3, 2019

Uh oh!

AmplabJenkins commented Apr 3, 2019

Uh oh!

cffan commented Apr 3, 2019

Uh oh!

AmplabJenkins commented Apr 3, 2019

Uh oh!

ericl commented Apr 3, 2019

Uh oh!

AmplabJenkins commented Apr 4, 2019

Uh oh!

cffan commented Apr 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericl Apr 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericl commented Apr 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Apr 11, 2019

Uh oh!

AmplabJenkins commented Apr 11, 2019

Uh oh!

AmplabJenkins commented Apr 11, 2019

Uh oh!

ericl left a comment

Choose a reason for hiding this comment

Uh oh!

cffan commented Apr 11, 2019

Uh oh!

ericl commented Apr 11, 2019

Uh oh!

AmplabJenkins commented Apr 12, 2019

Uh oh!

AmplabJenkins commented Apr 12, 2019

Uh oh!

ericl commented Apr 12, 2019

Uh oh!

Uh oh!

cffan commented Apr 3, 2019 •

edited by ericl

Loading

ericl Apr 4, 2019 •

edited

Loading