Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiagent model using concatenated observations #1416

Merged
merged 19 commits into from
Jan 19, 2018

Conversation

eugenevinitsky
Copy link
Contributor

What do these changes do?

Added multiagent capability to rllib. Multiagent capability is adding by using a tuple observation space that is flattened and concatenated.
Adds a multiagent model that splits the input and output vectors appropriately based on inputs provided via config. This works for both shared and non-shared models but only works for shared rewards.
Compatible with GAE.
Example run scripts are provided in rllib/examples.
Unit-tests still pass.
Currently only works for PPO but a small patch makes it compatible for the other algorithms (adding handling of list observation spaces)

Related issue number

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3203/
Test PASSed.

@ericl ericl self-assigned this Jan 12, 2018
@richardliaw
Copy link
Contributor

awesome! @eugenevinitsky is this ready for review?

@eugenevinitsky
Copy link
Contributor Author

eugenevinitsky commented Jan 13, 2018

@richardliaw I think so!

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Eugene, this is looking pretty good! A few comments, others inline:

Example run scripts are provided in rllib/examples.

I don't see these, are they checked in? Btw, you should add them also to

--config '{"kl_coeff": 1.0, "num_sgd_iter": 10, "sgd_stepsize": 1e-4, "sgd_batchsize": 64, "timesteps_per_batch": 2000, "num_workers": 1, "model": {"dim": 40, "conv_filters": [[16, [8, 8], 4], [32, [4, 4], 2], [512, [5, 5], 1]]}, "extra_frameskip": 4}'
to make sure future changes don't break multiagent.

Currently only works for PPO but a small patch makes it compatible for the other algorithms (adding handling of list observation spaces)

Do you have an idea of what changes would be needed? If it's small we could fix it here (or it's bigger than another PR might be better).

# return tf.concat([s.sample() for s in self.child_distributions], axis=1)


#TODO(ev) why does moving this to utils cause an error?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the error you're seeing?

if isinstance(distribution, Categorical):
split_list[i] = tf.squeeze(split_list[i], axis=-1)
log_list = np.asarray([distribution.logp(split_x) for
distribution, split_x in zip(self.child_distributions, split_list)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is a linting error -- you can check the travis output for the pyflake command to run / lint errors.

return np.asarray(diffed_list).astype(int)


def get_flat_box(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm a bunch of these methods look unused, could you clean this up?


def split_tensor(self, tensor, axis=-1):
# FIXME (ev) This won't work for mixed action distributions like one agent Gaussian one agent discrete
slice_rescale = int(tensor.shape.as_list()[axis] / int(np.sum(self.get_slice_lengths())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use tf.reshape() as follows instead of tf.split: https://github.com/ray-project/ray/blob/master/examples/carla/models.py#L46

Copy link
Contributor Author

@eugenevinitsky eugenevinitsky Jan 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What advantage does this provide? Just curious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this was to address the fixme comment above. Reshape can handle mixed shapes

dist, action_size = ModelCatalog.get_action_dist(action)
child_dist.append(dist)
size += action_size
return partial(MultiActionDistribution, child_distributions=child_dist,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

with tf.name_scope("fc_net"):
i = 1
last_layer = inputs
for size in hiddens:
label = "fc{}".format(i) if singular else "fc{}_{}".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering if the name scope manipulation can be done in the multi agent class to keep multiagent code out of the individual models

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good call.

num_actions = output_reshaper.split_number(num_outputs)
# convert the input spaces to shapes that we can use to divide the shapes

hiddens = options.get("fcnet_hiddens", [[256, 256]]*1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than reuse the config name, could we prefix it with multiagent, e.g. multiagent_fcnet_hiddens, to avoid confusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This is a good call.

@@ -238,6 +237,8 @@ def _fetch_metrics_from_remote_evaluators(self):
np.mean(episode_lengths) if episode_lengths else float('nan'))
timesteps = np.sum(episode_lengths) if episode_lengths else 0

print("total reward is ", avg_reward)
print("trajectory length mean is ", avg_length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these

@@ -19,7 +20,7 @@ def __init__(
prev_logits, prev_vf_preds, logit_dim,
kl_coeff, distribution_class, config, sess, registry):
assert (isinstance(action_space, gym.spaces.Discrete) or
isinstance(action_space, gym.spaces.Box))
isinstance(action_space, gym.spaces.Box) or isinstance(action_space, list))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should just remove this assert.

if isinstance(action_space[0], gym.spaces.Discrete):
self.actions = tf.placeholder(tf.int64, shape=(None, len(action_space)))
elif isinstance(action_space[0], gym.spaces.Box):
self.actions = tf.placeholder(tf.float32, shape=(None, size))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a TODO above to pull these if blocks out into a util function, seems like now is the right time.

@eugenevinitsky
Copy link
Contributor Author

eugenevinitsky commented Jan 13, 2018

@ericl Thanks for the super thorough review!
I'll make these edits and add the changes to the other algorithms. It's just handling the case of an action space being a list and as you mentioned, might as well pull that into a util now.
As for moving the Reshaper to a util, the error has mysteriously disappeared.

@richardliaw
Copy link
Contributor

btw, you can check lint tests locally by running flake8 . in your rllib directory

@eugenevinitsky
Copy link
Contributor Author

eugenevinitsky commented Jan 14, 2018

@ericl @richardliaw after re-setting up ray using python setup.py develop I'm getting a segfault. Is there something obvious I could be doing wrong?

@ericl
Copy link
Contributor

ericl commented Jan 15, 2018

One thing to try is git clean -ffdx (warning, removes all uncommitted changes). Occasionally I seem to need to do that to fix the build.

@eugenevinitsky
Copy link
Contributor Author

eugenevinitsky commented Jan 15, 2018

@ericl I added all the review comments except:

  1. Using tf.reshape instead of split. I can add that as a small request with a mixed discrete, continuous example.
  2. Getting the other algorithms working.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3229/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3232/
Test FAILed.

@eugenevinitsky
Copy link
Contributor Author

Is there a good way to run the Travis build locally so that I don't keep pushing commits that fail the build?

@robertnishihara
Copy link
Collaborator

You can enable Travis for your fork. Then it will test commits that you push to your fork (if you don't create a PR then it won't test it on the Ray Travis account).

You can also run the tests by hand that are in https://github.com/ray-project/ray/blob/master/.travis.yml.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a few style comments before merge.

@@ -0,0 +1,56 @@
''' Multiagent mountain car. Each agent outputs an action which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we prefer double quotes (though this has yet to be added as a linter rule)

"""Action distribution that operates for list of actions.

Args:
inputs (Tensor list): A list of tensors from which to compute samples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 4 spaces indent here

def get_action_placeholder(action_space):
"""Returns an action placeholder that is consistent with the action space

Args: action_space (Space): Action space of the target gym env.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

        Args:
            action_space (Space): Action space of the target gym env.
        Returns:
            action_placeholder (Tensor): A placeholder for the actions

return tf.placeholder(tf.float32, shape=(None, size))
else:
raise NotImplemented(
"action space" + str(type(action_space)) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise NotImplementedError("action space {} not supported".format(action_space))

diffed_list.insert(0, self.slice_positions[0])
return np.asarray(diffed_list).astype(int)

def get_flat_box(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems unused

raise NotImplemented(
"action space" + str(type(action_space)) +
"currently not supported")
self.actions = ModelCatalog.get_action_placeholder(action_space)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@@ -162,3 +162,7 @@ docker run --rm --shm-size=10G --memory=10G $DOCKER_SHA \
docker run --rm --shm-size=10G --memory=10G $DOCKER_SHA \
python /ray/python/ray/tune/examples/tune_mnist_ray.py \
--fast

python /ray/python/ray/rllib/examples/multiagent_mountaincar.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

docker run --rm --shm-size=10G --memory=10G $DOCKER_SHA \
   python /ray/python/ray/rllib/examples/multiagent_mountaincar.py

and same below?

config["model"].update({"fcnet_hiddens": [256, 256]})
options = {"obs_shapes": [2, 2],
"act_shapes": [3, 3],
"shared_model": False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to prefix all of these with multiagent_ for consistency

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3236/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3240/
Test PASSed.

@eugenevinitsky
Copy link
Contributor Author

Thanks for the thorough reviews! @ericl

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3271/
Test PASSed.

@ericl
Copy link
Contributor

ericl commented Jan 19, 2018

The valgrind failure looks unrelated, merging.

@ericl ericl merged commit 37076a9 into ray-project:master Jan 19, 2018
royf added a commit to royf/ray that referenced this pull request Jan 20, 2018
* master:
  Add some development tips to documentation. (ray-project#1426)
  Add link to github from documentation. (ray-project#1425)
  [rllib] Update docs with api and components overview figures (ray-project#1443)
  Multiagent model using concatenated observations (ray-project#1416)
  Load evaluation configuration from checkpoint (ray-project#1392)
  [autoscaling] increase connect timeout, boto retries, and check subnet conf (ray-project#1422)
  Update wheel in autoscaler example. (ray-project#1408)
  [autoscaler] Fix ValueError: Missing required config keyavailability_zoneof type str
  [tune][minor] Fixes (ray-project#1383)
  [rllib] Expose PPO evaluator resource requirements (ray-project#1391)
  fix autoscaler test (ray-project#1411)
  [rllib] Fix incorrect documentation on how to use custom models ray-project#1405
  Added option for availability zone (ray-project#1393)
  Adding all DataFrame methods with NotImplementedErrors (ray-project#1403)
  Remove pyarrow version check. (ray-project#1394)

# Conflicts:
#	python/ray/rllib/eval.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants