prototype jax with ddpg #187

vwxyzjn · 2022-05-29T18:26:07Z

Description

Types of changes

New algorithm

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
I have added additional documentation and previewed the changes via mkdocs serve.
- I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves (in PNG format with width=500 and height=300).
- I have added links to the tracked experiments.
I have updated the tests accordingly (if applicable).

gitpod-io · 2022-05-29T18:26:11Z

vercel · 2022-05-29T18:26:11Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Jul 12, 2022 at 9:17PM (UTC)

vwxyzjn · 2022-06-24T22:37:13Z

@dosssman @huxiao09 I seem to have gotten CleanRL's DDPG + Jax working: about 5x speed up for free.

vwxyzjn · 2022-06-26T16:16:28Z

@dosssman and @ikostrikov could you help review this, please? I am unfamiliar with JAX so might be coding up things wrong or have really bad format...

ikostrikov · 2022-06-26T16:44:32Z

Looks good to me! The only thing I would add is TrainState:
https://flax.readthedocs.io/en/latest/flax.training.html#flax.training.train_state.TrainState

target_params are initialized with the same RNG key

vwxyzjn · 2022-06-29T00:40:33Z

@dosssman @yooceii could you give a review, please? The changes have been finalized.

dosssman

Not very familiar with Jax, so can't really suggest any quality improvement.
Beside that, its relatively easy to understand, and the algorithm logic looks good to me.
Great work as always.

cleanrl/ddpg_continuous_action_jax.py

vwxyzjn · 2022-06-30T14:44:05Z

@dosssman, @yooceii, and @joaogui1 this is ready for review with docs (https://cleanrl-git-jax-ddpg-vwxyzjn.vercel.app/rl-algorithms/ddpg/#ddpg_continuous_action_jaxpy, note some of the links don't work until this PR is merged).

joaogui1 · 2022-07-12T13:29:44Z

cleanrl/ddpg_continuous_action_jax.py

+        x = nn.relu(x)
+        x = nn.Dense(self.action_dim)(x)
+        x = nn.tanh(x)
+        x * self.action_scale + self.action_bias


This should be x = x * self.action_scale + self.action_bias, no?

Thanks for this great catch! I am fixing this and will merge after CI passes.

prototype jax with ddpg

f127aa3

Quick fix

cbc5d88

vercel bot deployed to Preview June 22, 2022 02:21 View deployment

quick fix

b4662c2

vercel bot deployed to Preview June 22, 2022 23:15 View deployment

Commit changes - successful prototype

754a0b1

vercel bot deployed to Preview June 24, 2022 22:31 View deployment

Remove scripts

223a8ff

vercel bot deployed to Preview June 25, 2022 01:37 View deployment

Simplify the implementation: careful with shape

85fbfe2

vercel bot deployed to Preview June 25, 2022 02:06 View deployment

Format

8ffbd26

vercel bot deployed to Preview June 25, 2022 02:50 View deployment

vwxyzjn requested a review from dosssman June 25, 2022 02:50

Remove code

c72cfb7

vercel bot deployed to Preview June 25, 2022 02:53 View deployment

formatting changes

bfece78

vercel bot deployed to Preview June 25, 2022 02:55 View deployment

vwxyzjn added 2 commits June 24, 2022 23:01

formatting change

0710728

bug fix

92d9d13

correctly implementing keys

ee80f6b

vercel bot deployed to Preview June 26, 2022 16:19 View deployment

This was referenced Jun 26, 2022

Prototype TD3 with JAX #216

Closed

JAX Integration with CleanRL #218

Closed

these two lines are not necessary

0b30c57

target_params are initialized with the same RNG key

vwxyzjn added 5 commits June 28, 2022 20:42

Merge branch 'master' into jax-ddpg

52243ec

update docs

9ec4ac5

Add jax benchmark experiments

acb3293

remove old files

0e9d8f4

update benchmark scripts

8226824

vercel bot deployed to Preview June 29, 2022 01:00 View deployment

dosssman approved these changes Jun 29, 2022

View reviewed changes

cleanrl/ddpg_continuous_action_jax.py Show resolved Hide resolved

update lock files

57230c3

vwxyzjn mentioned this pull request Jun 29, 2022

prototype jax with dqn #222

Merged

18 tasks

vercel bot deployed to Preview June 30, 2022 00:46 View deployment

Handle action space bounds

29a0aef

vercel bot deployed to Preview June 30, 2022 02:26 View deployment

Merge branch 'master' into jax-ddpg

5f0ed84

vercel bot deployed to Preview June 30, 2022 14:18 View deployment

Add docs

024b8c5

vercel bot deployed to Preview June 30, 2022 14:27 View deployment

vwxyzjn added 2 commits June 30, 2022 10:28

Typo

34c2825

update CI

e12c283

vercel bot deployed to Preview June 30, 2022 14:43 View deployment

joaogui1 reviewed Jul 12, 2022

View reviewed changes

bug fix and add docs link

7b5febd

vercel bot deployed to Preview July 12, 2022 13:49 View deployment

Add a note explaining the speed

eb85ae6

vercel bot deployed to Preview July 12, 2022 17:02 View deployment

Update ddpg docs

003a770

vercel bot deployed to Preview July 12, 2022 21:17 View deployment

vwxyzjn merged commit 7eeb583 into master Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prototype jax with ddpg #187

prototype jax with ddpg #187

vwxyzjn commented May 29, 2022 •

edited

Loading

gitpod-io bot commented May 29, 2022

vercel bot commented May 29, 2022 •

edited

Loading

vwxyzjn commented Jun 24, 2022

vwxyzjn commented Jun 26, 2022

ikostrikov commented Jun 26, 2022

vwxyzjn commented Jun 29, 2022

dosssman left a comment

vwxyzjn commented Jun 30, 2022

joaogui1 Jul 12, 2022

vwxyzjn Jul 12, 2022

prototype jax with ddpg #187

prototype jax with ddpg #187

Conversation

vwxyzjn commented May 29, 2022 • edited Loading

Description

Types of changes

Checklist:

gitpod-io bot commented May 29, 2022

vercel bot commented May 29, 2022 • edited Loading

vwxyzjn commented Jun 24, 2022

vwxyzjn commented Jun 26, 2022

ikostrikov commented Jun 26, 2022

vwxyzjn commented Jun 29, 2022

dosssman left a comment

Choose a reason for hiding this comment

vwxyzjn commented Jun 30, 2022

joaogui1 Jul 12, 2022

Choose a reason for hiding this comment

vwxyzjn Jul 12, 2022

Choose a reason for hiding this comment

vwxyzjn commented May 29, 2022 •

edited

Loading

vercel bot commented May 29, 2022 •

edited

Loading