-
-
Notifications
You must be signed in to change notification settings - Fork 109
WIP: Add MPO in zoo #604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add MPO in zoo #604
Conversation
…arners/approximators/neural_network_approximator.jl
There we go, it's finally done. This PR adds MPO, you can find details in the dedicated doc page. It supports Categorical, Gaussian, and Full Covariance Gaussian policies. Compared to the MPO algorithm described in the related paper (see above), it does not support two main things:
I implemented three experiments in the test suite, one for each type of policy. They all learn a perfect CartPole policy in less than a minute using only a CPU, at least on my computer. |
I'm opening this as a draft so discussions are possible early.
This implements the MPO algorithm from this paper and its improved version
PR Checklist
DiscreteNetwork
akin toGaussianNetwork
may be a better approach than considering that if a GN is not used, then it must be a Discrete actor.