Multiagent model using concatenated observations #1416

eugenevinitsky · 2018-01-12T03:03:15Z

What do these changes do?

Added multiagent capability to rllib. Multiagent capability is adding by using a tuple observation space that is flattened and concatenated.
Adds a multiagent model that splits the input and output vectors appropriately based on inputs provided via config. This works for both shared and non-shared models but only works for shared rewards.
Compatible with GAE.
Example run scripts are provided in rllib/examples.
Unit-tests still pass.
Currently only works for PPO but a small patch makes it compatible for the other algorithms (adding handling of list observation spaces)

Related issue number

AmplabJenkins · 2018-01-12T03:50:45Z

Merged build finished. Test PASSed.

AmplabJenkins · 2018-01-12T03:50:46Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3203/
Test PASSed.

richardliaw · 2018-01-13T00:39:22Z

awesome! @eugenevinitsky is this ready for review?

eugenevinitsky · 2018-01-13T00:53:23Z

@richardliaw I think so!

ericl

Thanks Eugene, this is looking pretty good! A few comments, others inline:

Example run scripts are provided in rllib/examples.

I don't see these, are they checked in? Btw, you should add them also to

ray/test/jenkins_tests/run_multi_node_tests.sh

Line 147 in 47b1f02

    
           --config '{"kl_coeff": 1.0, "num_sgd_iter": 10, "sgd_stepsize": 1e-4, "sgd_batchsize": 64, "timesteps_per_batch": 2000, "num_workers": 1, "model": {"dim": 40, "conv_filters": [[16, [8, 8], 4], [32, [4, 4], 2], [512, [5, 5], 1]]}, "extra_frameskip": 4}'

to make sure future changes don't break multiagent.

Currently only works for PPO but a small patch makes it compatible for the other algorithms (adding handling of list observation spaces)

Do you have an idea of what changes would be needed? If it's small we could fix it here (or it's bigger than another PR might be better).

ericl · 2018-01-13T01:41:37Z

python/ray/rllib/models/action_dist.py

+        # return tf.concat([s.sample() for s in self.child_distributions], axis=1)
+
+
+#TODO(ev) why does moving this to utils cause an error?


What's the error you're seeing?

ericl · 2018-01-13T01:42:36Z

python/ray/rllib/models/action_dist.py

+            if isinstance(distribution, Categorical):
+                split_list[i] = tf.squeeze(split_list[i], axis=-1)
+        log_list = np.asarray([distribution.logp(split_x) for
+                              distribution, split_x in zip(self.child_distributions, split_list)])


This probably is a linting error -- you can check the travis output for the pyflake command to run / lint errors.

ericl · 2018-01-13T01:43:34Z

python/ray/rllib/models/action_dist.py

+        return np.asarray(diffed_list).astype(int)
+
+
+    def get_flat_box(self):


Hm a bunch of these methods look unused, could you clean this up?

ericl · 2018-01-13T01:47:50Z

python/ray/rllib/models/action_dist.py

+
+    def split_tensor(self, tensor, axis=-1):
+        # FIXME (ev) This won't work for mixed action distributions like one agent Gaussian one agent discrete
+        slice_rescale = int(tensor.shape.as_list()[axis] / int(np.sum(self.get_slice_lengths())))


I think you can use tf.reshape() as follows instead of tf.split: https://github.com/ray-project/ray/blob/master/examples/carla/models.py#L46

What advantage does this provide? Just curious.

Oh, this was to address the fixme comment above. Reshape can handle mixed shapes

ericl · 2018-01-13T01:48:33Z

python/ray/rllib/models/catalog.py

+                dist, action_size = ModelCatalog.get_action_dist(action)
+                child_dist.append(dist)
+                size += action_size
+            return partial(MultiActionDistribution, child_distributions=child_dist,


ericl · 2018-01-13T01:55:11Z

python/ray/rllib/models/fcnet.py

        with tf.name_scope("fc_net"):
            i = 1
            last_layer = inputs
            for size in hiddens:
+                label = "fc{}".format(i) if singular else "fc{}_{}".format(


Also wondering if the name scope manipulation can be done in the multi agent class to keep multiagent code out of the individual models

Yeah, good call.

ericl · 2018-01-13T01:56:40Z

python/ray/rllib/models/multiagentfcnet.py

+        num_actions = output_reshaper.split_number(num_outputs)
+        # convert the input spaces to shapes that we can use to divide the shapes
+
+        hiddens = options.get("fcnet_hiddens", [[256, 256]]*1)


Rather than reuse the config name, could we prefix it with multiagent, e.g. multiagent_fcnet_hiddens, to avoid confusion?

Yes! This is a good call.

ericl · 2018-01-13T01:57:46Z

python/ray/rllib/ppo/ppo.py

@@ -238,6 +237,8 @@ def _fetch_metrics_from_remote_evaluators(self):
            np.mean(episode_lengths) if episode_lengths else float('nan'))
        timesteps = np.sum(episode_lengths) if episode_lengths else 0

+        print("total reward is ", avg_reward)
+        print("trajectory length mean is ", avg_length)


Remove these

ericl · 2018-01-13T01:57:48Z

python/ray/rllib/ppo/loss.py

@@ -19,7 +20,7 @@ def __init__(
            prev_logits, prev_vf_preds, logit_dim,
            kl_coeff, distribution_class, config, sess, registry):
        assert (isinstance(action_space, gym.spaces.Discrete) or
-                isinstance(action_space, gym.spaces.Box))
+                isinstance(action_space, gym.spaces.Box) or isinstance(action_space, list))


Wondering if we should just remove this assert.

ericl · 2018-01-13T01:58:37Z

python/ray/rllib/ppo/ppo_evaluator.py

+            if isinstance(action_space[0], gym.spaces.Discrete):
+                self.actions = tf.placeholder(tf.int64, shape=(None, len(action_space)))
+            elif isinstance(action_space[0], gym.spaces.Box):
+                self.actions = tf.placeholder(tf.float32, shape=(None, size))


There's a TODO above to pull these if blocks out into a util function, seems like now is the right time.

eugenevinitsky · 2018-01-13T02:23:40Z

@ericl Thanks for the super thorough review!
I'll make these edits and add the changes to the other algorithms. It's just handling the case of an action space being a list and as you mentioned, might as well pull that into a util now.
As for moving the Reshaper to a util, the error has mysteriously disappeared.

richardliaw · 2018-01-13T07:42:23Z

btw, you can check lint tests locally by running flake8 . in your rllib directory

eugenevinitsky · 2018-01-14T00:03:56Z

@ericl @richardliaw after re-setting up ray using python setup.py develop I'm getting a segfault. Is there something obvious I could be doing wrong?

…og, all multiagent code moved out of fcnet

ericl · 2018-01-15T05:27:31Z

One thing to try is git clean -ffdx (warning, removes all uncommitted changes). Occasionally I seem to need to do that to fix the build.

eugenevinitsky · 2018-01-15T20:50:36Z

@ericl I added all the review comments except:

Using tf.reshape instead of split. I can add that as a small request with a mixed discrete, continuous example.
Getting the other algorithms working.

AmplabJenkins · 2018-01-15T21:10:42Z

Merged build finished. Test FAILed.

AmplabJenkins · 2018-01-15T21:10:43Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3229/
Test FAILed.

AmplabJenkins · 2018-01-15T23:01:47Z

Merged build finished. Test FAILed.

AmplabJenkins · 2018-01-15T23:01:48Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3232/
Test FAILed.

eugenevinitsky · 2018-01-15T23:25:07Z

Is there a good way to run the Travis build locally so that I don't keep pushing commits that fail the build?

robertnishihara · 2018-01-15T23:28:39Z

You can enable Travis for your fork. Then it will test commits that you push to your fork (if you don't create a PR then it won't test it on the Ray Travis account).

You can also run the tests by hand that are in https://github.com/ray-project/ray/blob/master/.travis.yml.

ericl

LGTM. Just a few style comments before merge.

ericl · 2018-01-16T00:10:12Z

python/ray/rllib/examples/multiagent_mountaincar.py

@@ -0,0 +1,56 @@
+''' Multiagent mountain car. Each agent outputs an action which


nit: we prefer double quotes (though this has yet to be added as a linter rule)

ericl · 2018-01-16T00:11:39Z

python/ray/rllib/models/action_dist.py

+    """Action distribution that operates for list of actions.
+
+    Args:
+    inputs (Tensor list): A list of tensors from which to compute samples.


nit: 4 spaces indent here

ericl · 2018-01-16T00:14:22Z

python/ray/rllib/models/catalog.py

+    def get_action_placeholder(action_space):
+        """Returns an action placeholder that is consistent with the action space
+
+        Args: action_space (Space): Action space of the target gym env.


nit:

Args: action_space (Space): Action space of the target gym env. Returns: action_placeholder (Tensor): A placeholder for the actions

ericl · 2018-01-16T00:15:25Z

python/ray/rllib/models/catalog.py

+                return tf.placeholder(tf.float32, shape=(None, size))
+        else:
+            raise NotImplemented(
+                "action space" + str(type(action_space)) +


raise NotImplementedError("action space {} not supported".format(action_space))

ericl · 2018-01-16T00:17:01Z

python/ray/rllib/utils/reshaper.py

+        diffed_list.insert(0, self.slice_positions[0])
+        return np.asarray(diffed_list).astype(int)
+
+    def get_flat_box(self):


seems unused

ericl · 2018-01-16T00:17:24Z

python/ray/rllib/ppo/ppo_evaluator.py

-            raise NotImplemented(
-                "action space" + str(type(action_space)) +
-                "currently not supported")
+        self.actions = ModelCatalog.get_action_placeholder(action_space)


ericl · 2018-01-16T00:18:17Z

test/jenkins_tests/run_multi_node_tests.sh

@@ -162,3 +162,7 @@ docker run --rm --shm-size=10G --memory=10G $DOCKER_SHA \
 docker run --rm --shm-size=10G --memory=10G $DOCKER_SHA \
    python /ray/python/ray/tune/examples/tune_mnist_ray.py \
    --fast
+
+python /ray/python/ray/rllib/examples/multiagent_mountaincar.py


Should this be

docker run --rm --shm-size=10G --memory=10G $DOCKER_SHA \ python /ray/python/ray/rllib/examples/multiagent_mountaincar.py

and same below?

ericl · 2018-01-16T00:20:34Z

python/ray/rllib/examples/multiagent_mountaincar.py

+    config["model"].update({"fcnet_hiddens": [256, 256]})
+    options = {"obs_shapes": [2, 2],
+               "act_shapes": [3, 3],
+               "shared_model": False,


Might be good to prefix all of these with multiagent_ for consistency

AmplabJenkins · 2018-01-16T00:41:57Z

Merged build finished. Test PASSed.

AmplabJenkins · 2018-01-16T00:41:58Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3236/
Test PASSed.

AmplabJenkins · 2018-01-16T01:32:29Z

Merged build finished. Test PASSed.

AmplabJenkins · 2018-01-16T01:32:29Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3240/
Test PASSed.

eugenevinitsky · 2018-01-16T03:59:18Z

Thanks for the thorough reviews! @ericl

AmplabJenkins · 2018-01-17T05:20:15Z

Merged build finished. Test PASSed.

AmplabJenkins · 2018-01-17T05:20:16Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3271/
Test PASSed.

ericl · 2018-01-19T03:51:25Z

The valgrind failure looks unrelated, merging.

* master: Add some development tips to documentation. (ray-project#1426) Add link to github from documentation. (ray-project#1425) [rllib] Update docs with api and components overview figures (ray-project#1443) Multiagent model using concatenated observations (ray-project#1416) Load evaluation configuration from checkpoint (ray-project#1392) [autoscaling] increase connect timeout, boto retries, and check subnet conf (ray-project#1422) Update wheel in autoscaler example. (ray-project#1408) [autoscaler] Fix ValueError: Missing required config keyavailability_zoneof type str [tune][minor] Fixes (ray-project#1383) [rllib] Expose PPO evaluator resource requirements (ray-project#1391) fix autoscaler test (ray-project#1411) [rllib] Fix incorrect documentation on how to use custom models ray-project#1405 Added option for availability zone (ray-project#1393) Adding all DataFrame methods with NotImplementedErrors (ray-project#1403) Remove pyarrow version check. (ray-project#1394) # Conflicts: # python/ray/rllib/eval.py

eugenevinitsky and others added 7 commits January 8, 2018 01:22

working multi action distribution and multiagent model

85f6734

currently working but the splits arent done in the right place

004c5b2

added shared models

ea69216

added categorical support and mountain car example

3f77edd

Merge remote-tracking branch 'upstream/master' into merge_temp

7d02b0f

now compatible with generalized advantage estimation

176f8d6

working multiagent code with discrete and continuous example

062764c

ericl self-assigned this Jan 12, 2018

ericl reviewed Jan 13, 2018

View reviewed changes

ericl mentioned this pull request Jan 13, 2018

[rllib] Support training on multi-agent environments with shared rewards #1375

Closed

moved reshaper to utils

dda6f0f

eugenevinitsky added 4 commits January 13, 2018 16:07

code review changes made, ppo action placeholder moved to model catal…

bfe9bad

…og, all multiagent code moved out of fcnet

added examples in

1ebbe39

added PEP8 compliance

ff46a13

examples are mostly pep8 compliant

726c10b

eugenevinitsky added 2 commits January 15, 2018 12:09

removed all flake errors

3e95ffe

added examples to jenkins tests

10497ef

fixed custom options bug

e687c15

eugenevinitsky added 2 commits January 15, 2018 15:51

added lines to let docker file find multiagent tests

4013217

shortened example run length

fe34e62

ericl approved these changes Jan 16, 2018

View reviewed changes

ericl reviewed Jan 16, 2018

View reviewed changes

corrected nits

c2ba4fa

fixed flake errors

8672cdf

ericl merged commit 37076a9 into ray-project:master Jan 19, 2018

		# return tf.concat([s.sample() for s in self.child_distributions], axis=1)


		#TODO(ev) why does moving this to utils cause an error?

		return np.asarray(diffed_list).astype(int)


		def get_flat_box(self):

		@@ -0,0 +1,56 @@
		''' Multiagent mountain car. Each agent outputs an action which

Multiagent model using concatenated observations #1416

Multiagent model using concatenated observations #1416

Conversation

eugenevinitsky commented Jan 12, 2018

What do these changes do?

Related issue number

AmplabJenkins commented Jan 12, 2018

AmplabJenkins commented Jan 12, 2018

richardliaw commented Jan 13, 2018

eugenevinitsky commented Jan 13, 2018 • edited Loading

ericl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eugenevinitsky Jan 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eugenevinitsky commented Jan 13, 2018 • edited Loading

richardliaw commented Jan 13, 2018

eugenevinitsky commented Jan 14, 2018 • edited Loading

ericl commented Jan 15, 2018 • edited Loading

eugenevinitsky commented Jan 15, 2018 • edited Loading

AmplabJenkins commented Jan 15, 2018

AmplabJenkins commented Jan 15, 2018

AmplabJenkins commented Jan 15, 2018

AmplabJenkins commented Jan 15, 2018

eugenevinitsky commented Jan 15, 2018

robertnishihara commented Jan 15, 2018

ericl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Jan 16, 2018

AmplabJenkins commented Jan 16, 2018

AmplabJenkins commented Jan 16, 2018

AmplabJenkins commented Jan 16, 2018

eugenevinitsky commented Jan 16, 2018

AmplabJenkins commented Jan 17, 2018

AmplabJenkins commented Jan 17, 2018

ericl commented Jan 19, 2018

eugenevinitsky commented Jan 13, 2018 •

edited

Loading

eugenevinitsky Jan 14, 2018 •

edited

Loading

eugenevinitsky commented Jan 13, 2018 •

edited

Loading

eugenevinitsky commented Jan 14, 2018 •

edited

Loading

ericl commented Jan 15, 2018 •

edited

Loading

eugenevinitsky commented Jan 15, 2018 •

edited

Loading