Skip to content

WIP: Add MPO in zoo #604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 212 commits into from
Dec 22, 2022
Merged

WIP: Add MPO in zoo #604

merged 212 commits into from
Dec 22, 2022

Conversation

HenriDeh
Copy link
Member

@HenriDeh HenriDeh commented Mar 17, 2022

I'm opening this as a draft so discussions are possible early.
This implements the MPO algorithm from this paper and its improved version
PR Checklist

  • Add docstrings
  • Handle the case of a discrete actor. For this, I was wondering if a DiscreteNetwork akin to GaussianNetwork may be a better approach than considering that if a GN is not used, then it must be a Discrete actor.
  • Add some tests
  • Does this handle distributed environments?
  • Handle legal action masks
  • Decide default HPs
  • Make experiments with each network
  • Remove normalizer from networks ?
  • Make a dedicated doc page
  • Fix GPU (changes in RLTrajectories needed)

@HenriDeh
Copy link
Member Author

There we go, it's finally done. This PR adds MPO, you can find details in the dedicated doc page. It supports Categorical, Gaussian, and Full Covariance Gaussian policies. Compared to the MPO algorithm described in the related paper (see above), it does not support two main things:

  • It uses 1-step TD learning to update the critic network, whereas the paper uses retrace. Implementing retrace is a WIP.
  • It does not support distributed learners with gradient pooling. This is for later.

I implemented three experiments in the test suite, one for each type of policy. They all learn a perfect CartPole policy in less than a minute using only a CPU, at least on my computer.

@HenriDeh HenriDeh marked this pull request as ready for review December 22, 2022 11:00
@HenriDeh HenriDeh merged commit b9d0ee0 into JuliaReinforcementLearning:master Dec 22, 2022
@HenriDeh HenriDeh deleted the mpo branch December 22, 2022 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request RLZoo WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tanh normalization destabilizes learning with GaussianNetwork
2 participants