Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

141 add me pbt #143

Merged
merged 83 commits into from
Mar 10, 2023
Merged

141 add me pbt #143

merged 83 commits into from
Mar 10, 2023

Conversation

ranzenTom
Copy link
Collaborator

@ranzenTom ranzenTom commented Feb 27, 2023

Related issues: #141

This PR adds the population based training (PBT) algorithm and the MAP-Elites PBT (recently accepted at ICLR) algorithm. Both methods are compatible with both TD3 and SAC.

This PR introduces:

  • the humanoid trap environment
  • PBT TD3
  • PBT SAC
  • ME-PBT TD3
  • ME-PBT SAC
  • a refactoring of the SAC and TD3 agents and their losses (with minimal impacts on DIYAN and DADS) to make the implementation more flexible and allow the implementation of PBT agents as simple inheritance.

Checks

  • a clear description of the PR has been added
  • sufficient tests have been written
  • relevant section added to the documentation
  • example notebook added to the repo
  • clean docstrings and comments have been written
  • if any issue/observation has been discovered, a new issue has been opened

@felixchalumeau felixchalumeau changed the base branch from main to develop March 1, 2023 13:41
@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2023

Codecov Report

Merging #143 (c8d0c2f) into develop (7cfd5bc) will decrease coverage by 0.14%.
The diff coverage is 91.43%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##           develop     #143      +/-   ##
===========================================
- Coverage    92.41%   92.28%   -0.14%     
===========================================
  Files          105      116      +11     
  Lines         5910     6763     +853     
===========================================
+ Hits          5462     6241     +779     
- Misses         448      522      +74     
Impacted Files Coverage Δ
qdax/core/containers/mapelites_repertoire.py 85.71% <ø> (ø)
qdax/core/containers/mome_repertoire.py 98.14% <ø> (ø)
qdax/core/distributed_map_elites.py 100.00% <ø> (ø)
qdax/core/neuroevolution/sac_td3_utils.py 100.00% <ø> (ø)
qdax/environments/exploration_wrappers.py 32.97% <0.00%> (ø)
qdax/environments/locomotion_wrappers.py 85.49% <ø> (ø)
qdax/environments/humanoidtrap.py 18.30% <18.30%> (ø)
qdax/environments/__init__.py 86.79% <60.00%> (-3.01%) ⬇️
qdax/baselines/pbt.py 76.47% <76.47%> (ø)
qdax/baselines/sac_pbt.py 96.46% <96.46%> (ø)
... and 24 more

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@limbryan
Copy link
Collaborator

limbryan commented Mar 6, 2023

Seems like this structure and update finally managed to solve our longstanding issue of difference between the td3 and sac algorithm structure to reduce our code duplication and have more uniform/modular structure!
Should we get rid of useless do_iterations and warmstart buffer in mdp_utils? Or Completely get rid of mdp_utils and move the stuff inside sac_td3_utils?
do_iteration and warmstart buffer for both td3 and sac is taken from sac_td3_utils now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ME-PBT
5 participants