-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prototype jax with ddpg #187
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@dosssman and @ikostrikov could you help review this, please? I am unfamiliar with JAX so might be coding up things wrong or have really bad format... |
Looks good to me! The only thing I would add is TrainState: |
target_params are initialized with the same RNG key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very familiar with Jax, so can't really suggest any quality improvement.
Beside that, its relatively easy to understand, and the algorithm logic looks good to me.
Great work as always.
@dosssman, @yooceii, and @joaogui1 this is ready for review with docs (https://cleanrl-git-jax-ddpg-vwxyzjn.vercel.app/rl-algorithms/ddpg/#ddpg_continuous_action_jaxpy, note some of the links don't work until this PR is merged). |
x = nn.relu(x) | ||
x = nn.Dense(self.action_dim)(x) | ||
x = nn.tanh(x) | ||
x * self.action_scale + self.action_bias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be x = x * self.action_scale + self.action_bias
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this great catch! I am fixing this and will merge after CI passes.
Description
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).