-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiagent model using concatenated observations #1416
Conversation
Merged build finished. Test PASSed. |
Test PASSed. |
awesome! @eugenevinitsky is this ready for review? |
@richardliaw I think so! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Eugene, this is looking pretty good! A few comments, others inline:
Example run scripts are provided in rllib/examples.
I don't see these, are they checked in? Btw, you should add them also to
--config '{"kl_coeff": 1.0, "num_sgd_iter": 10, "sgd_stepsize": 1e-4, "sgd_batchsize": 64, "timesteps_per_batch": 2000, "num_workers": 1, "model": {"dim": 40, "conv_filters": [[16, [8, 8], 4], [32, [4, 4], 2], [512, [5, 5], 1]]}, "extra_frameskip": 4}' |
Currently only works for PPO but a small patch makes it compatible for the other algorithms (adding handling of list observation spaces)
Do you have an idea of what changes would be needed? If it's small we could fix it here (or it's bigger than another PR might be better).
# return tf.concat([s.sample() for s in self.child_distributions], axis=1) | ||
|
||
|
||
#TODO(ev) why does moving this to utils cause an error? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the error you're seeing?
if isinstance(distribution, Categorical): | ||
split_list[i] = tf.squeeze(split_list[i], axis=-1) | ||
log_list = np.asarray([distribution.logp(split_x) for | ||
distribution, split_x in zip(self.child_distributions, split_list)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably is a linting error -- you can check the travis output for the pyflake command to run / lint errors.
return np.asarray(diffed_list).astype(int) | ||
|
||
|
||
def get_flat_box(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm a bunch of these methods look unused, could you clean this up?
|
||
def split_tensor(self, tensor, axis=-1): | ||
# FIXME (ev) This won't work for mixed action distributions like one agent Gaussian one agent discrete | ||
slice_rescale = int(tensor.shape.as_list()[axis] / int(np.sum(self.get_slice_lengths()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use tf.reshape() as follows instead of tf.split: https://github.com/ray-project/ray/blob/master/examples/carla/models.py#L46
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What advantage does this provide? Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this was to address the fixme comment above. Reshape can handle mixed shapes
python/ray/rllib/models/catalog.py
Outdated
dist, action_size = ModelCatalog.get_action_dist(action) | ||
child_dist.append(dist) | ||
size += action_size | ||
return partial(MultiActionDistribution, child_distributions=child_dist, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
python/ray/rllib/models/fcnet.py
Outdated
with tf.name_scope("fc_net"): | ||
i = 1 | ||
last_layer = inputs | ||
for size in hiddens: | ||
label = "fc{}".format(i) if singular else "fc{}_{}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wondering if the name scope manipulation can be done in the multi agent class to keep multiagent code out of the individual models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good call.
num_actions = output_reshaper.split_number(num_outputs) | ||
# convert the input spaces to shapes that we can use to divide the shapes | ||
|
||
hiddens = options.get("fcnet_hiddens", [[256, 256]]*1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than reuse the config name, could we prefix it with multiagent, e.g. multiagent_fcnet_hiddens
, to avoid confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This is a good call.
python/ray/rllib/ppo/ppo.py
Outdated
@@ -238,6 +237,8 @@ def _fetch_metrics_from_remote_evaluators(self): | |||
np.mean(episode_lengths) if episode_lengths else float('nan')) | |||
timesteps = np.sum(episode_lengths) if episode_lengths else 0 | |||
|
|||
print("total reward is ", avg_reward) | |||
print("trajectory length mean is ", avg_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these
python/ray/rllib/ppo/loss.py
Outdated
@@ -19,7 +20,7 @@ def __init__( | |||
prev_logits, prev_vf_preds, logit_dim, | |||
kl_coeff, distribution_class, config, sess, registry): | |||
assert (isinstance(action_space, gym.spaces.Discrete) or | |||
isinstance(action_space, gym.spaces.Box)) | |||
isinstance(action_space, gym.spaces.Box) or isinstance(action_space, list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we should just remove this assert.
if isinstance(action_space[0], gym.spaces.Discrete): | ||
self.actions = tf.placeholder(tf.int64, shape=(None, len(action_space))) | ||
elif isinstance(action_space[0], gym.spaces.Box): | ||
self.actions = tf.placeholder(tf.float32, shape=(None, size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a TODO above to pull these if blocks out into a util function, seems like now is the right time.
@ericl Thanks for the super thorough review! |
btw, you can check lint tests locally by running |
@ericl @richardliaw after re-setting up ray using python setup.py develop I'm getting a segfault. Is there something obvious I could be doing wrong? |
…og, all multiagent code moved out of fcnet
One thing to try is |
@ericl I added all the review comments except:
|
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Is there a good way to run the Travis build locally so that I don't keep pushing commits that fail the build? |
You can enable Travis for your fork. Then it will test commits that you push to your fork (if you don't create a PR then it won't test it on the Ray Travis account). You can also run the tests by hand that are in https://github.com/ray-project/ray/blob/master/.travis.yml. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a few style comments before merge.
@@ -0,0 +1,56 @@ | |||
''' Multiagent mountain car. Each agent outputs an action which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we prefer double quotes (though this has yet to be added as a linter rule)
"""Action distribution that operates for list of actions. | ||
|
||
Args: | ||
inputs (Tensor list): A list of tensors from which to compute samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 4 spaces indent here
python/ray/rllib/models/catalog.py
Outdated
def get_action_placeholder(action_space): | ||
"""Returns an action placeholder that is consistent with the action space | ||
|
||
Args: action_space (Space): Action space of the target gym env. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Args:
action_space (Space): Action space of the target gym env.
Returns:
action_placeholder (Tensor): A placeholder for the actions
python/ray/rllib/models/catalog.py
Outdated
return tf.placeholder(tf.float32, shape=(None, size)) | ||
else: | ||
raise NotImplemented( | ||
"action space" + str(type(action_space)) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplementedError("action space {} not supported".format(action_space))
python/ray/rllib/utils/reshaper.py
Outdated
diffed_list.insert(0, self.slice_positions[0]) | ||
return np.asarray(diffed_list).astype(int) | ||
|
||
def get_flat_box(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems unused
raise NotImplemented( | ||
"action space" + str(type(action_space)) + | ||
"currently not supported") | ||
self.actions = ModelCatalog.get_action_placeholder(action_space) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
@@ -162,3 +162,7 @@ docker run --rm --shm-size=10G --memory=10G $DOCKER_SHA \ | |||
docker run --rm --shm-size=10G --memory=10G $DOCKER_SHA \ | |||
python /ray/python/ray/tune/examples/tune_mnist_ray.py \ | |||
--fast | |||
|
|||
python /ray/python/ray/rllib/examples/multiagent_mountaincar.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be
docker run --rm --shm-size=10G --memory=10G $DOCKER_SHA \
python /ray/python/ray/rllib/examples/multiagent_mountaincar.py
and same below?
config["model"].update({"fcnet_hiddens": [256, 256]}) | ||
options = {"obs_shapes": [2, 2], | ||
"act_shapes": [3, 3], | ||
"shared_model": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to prefix all of these with multiagent_
for consistency
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Thanks for the thorough reviews! @ericl |
Merged build finished. Test PASSed. |
Test PASSed. |
The valgrind failure looks unrelated, merging. |
* master: Add some development tips to documentation. (ray-project#1426) Add link to github from documentation. (ray-project#1425) [rllib] Update docs with api and components overview figures (ray-project#1443) Multiagent model using concatenated observations (ray-project#1416) Load evaluation configuration from checkpoint (ray-project#1392) [autoscaling] increase connect timeout, boto retries, and check subnet conf (ray-project#1422) Update wheel in autoscaler example. (ray-project#1408) [autoscaler] Fix ValueError: Missing required config keyavailability_zoneof type str [tune][minor] Fixes (ray-project#1383) [rllib] Expose PPO evaluator resource requirements (ray-project#1391) fix autoscaler test (ray-project#1411) [rllib] Fix incorrect documentation on how to use custom models ray-project#1405 Added option for availability zone (ray-project#1393) Adding all DataFrame methods with NotImplementedErrors (ray-project#1403) Remove pyarrow version check. (ray-project#1394) # Conflicts: # python/ray/rllib/eval.py
What do these changes do?
Added multiagent capability to rllib. Multiagent capability is adding by using a tuple observation space that is flattened and concatenated.
Adds a multiagent model that splits the input and output vectors appropriately based on inputs provided via config. This works for both shared and non-shared models but only works for shared rewards.
Compatible with GAE.
Example run scripts are provided in rllib/examples.
Unit-tests still pass.
Currently only works for PPO but a small patch makes it compatible for the other algorithms (adding handling of list observation spaces)
Related issue number