Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for more complex action distributions #253

Open
theOGognf opened this issue Dec 26, 2022 · 3 comments
Open

Support for more complex action distributions #253

theOGognf opened this issue Dec 26, 2022 · 3 comments

Comments

@theOGognf
Copy link

The current action distribution model has some restrictions that inhibits richer families of action distributions for complex environments.
As far as I can tell, only single space distributions or tuple space distributions are supported. Many custom environments make use of action masking and autoregressive distributions for handling complex action spaces. It'd be nice if there was an interface for registering custom action distributions much like registering other components.

@alex-petrenko
Copy link
Owner

alex-petrenko commented Dec 27, 2022

This is a very reasonable inquiry. This would be a great feature to have in a future release. BTW, contributions are welcome, and I'd be happy to review code/provide suggestions if you decide to take on it!

@theOGognf what are the specific environments that you have in mind? Having concrete examples might help!

For now I would recommend forking the code and implementing the action distribution in a manner similar to how Tuple or other action distributions are implemented

@theOGognf
Copy link
Author

theOGognf commented Dec 27, 2022

Thanks for the quick response, Alex. I'd be happy to take a stab at it.

I can't share my environments, but there are a couple of examples from RLlib that get the point across. For action masking, an action mask is part of the observation and used to mask logits going into a model. Autoregressive distributions are usually specific to environments, but the whole point is building a model that can condition action heads on one another.

Here's RLlib's corresponding thread on supporting autoregressive distributions as well for reference.

I think it'd be easy to support if TensorDicts were passed between components rather than flattened (*vectors) observations, but I imagine that'd be a bit of a breaking change. Would a change like that be okay?

@alex-petrenko
Copy link
Owner

SF actually supports dictionaries of observations out of the box, so passing action masks along with observations should not be a problem. Just define an env with a dictionary observation space, and SF should correctly handle any number of key-value observations.

We're also already using TensorDict to pass these observations around, so this should not be an issue.

There is one design limitation motivated by performance considerations.
All tensors (observations, sampled actions, etc.) should have fixed predetermined size. In the case of masked actions, this shouldn't be a problem, but autoregressive actions might have varied size (I think?) In this case I recommend allocating tensors for the maximum action length.

There are currently two abstractions related to action distributions: ActionParameterizations (this is the part of the policy that outputs the parameters of the action distribution) and action distributions themselves.

I think to implement this properly we need facilities to define both custom parameterizations and custom ActionDistribution classes, which should have a well defined interface. E.g. action distributions should support sampling, entropy calculation, KL-divergence calculation (or at least some proxy of it), calculating logprob of a sampled action.

In case of masked actions this action distribution object will be stateful (i.e. holding a valid action mask)
In case of autoregressive distributions we need some custom logic in the sample() function.

Overall, this seems doable! I'm excited to see this feature and I'd be happy to help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants