Closed
Description
This issue is meant to be updated as the list of changes is not exhaustive
Dear all,
Stable-Baselines3 beta is now out 🎉 ! This issue is meant to reference what is implemented and what is missing before a first major version.
As mentioned in the README, before v1.0, breaking changes may occur. I would like to encourage contributors (especially the maintainers) to make comments on how to improve the library before v1.0 (and maybe make some internal changes).
I will try to review the features mentioned in hill-a/stable-baselines#576 (and hill-a/stable-baselines#733)
and I will create issues soon to reference what is missing.
What is implemented?
- basic features (training/saving/loading/predict)
- basic set of algorithms (A2C/PPO/SAC/TD3)
- basic pre-processing (Box and Discrete observation/action spaces are handled)
- callback support
- complete benchmark for the continuous action case
- basic rl zoo for training/evaluating plotting (https://github.com/DLR-RM/rl-baselines3-zoo)
- consistent api
- basic tests and most type hints
- continuous integration (I'm in discussion with the organization admins for that)
- handle more observation/action spaces Add support for MultiDiscrete/MultiBinary observation spaces #4 and Add support for MultiDiscrete/MultiBinary action spaces #5 (thanks @rolandgvc)
- tensorboard integration Tensorboard integration #9 (thanks @rolandgvc)
- basic documentation and notebooks
- automatic build of the documentation
- Vanilla DQN Implement Vanilla DQN #6 (thanks @Artemis-Skade)
- Refactor off-policy critics to reduce code duplication Implement DDPG #3 (see Refactored ContinuousCritic for SAC/TD3 #78 )
- DDPG Implement DDPG #3
- do a complete benchmark for the discrete case Performance Check (Discrete actions) #49 (thanks @Miffyli !)
- performance check for continuous actions Performance check (Continuous Actions) #48 (even better than gSDE paper)
- get/set parameters for the base class (Get/set parameters and review of saving and loading #138 )
- clean up type-hints in docs Custom parser for type hints #10 (cumbersome to read)
- documenting the migration between SB and SB3 Migration guide #11
- finish typing some methods Improve typing coverage #175
- HER Implement HER #8 (thanks @megan-klaiber)
- finishing to update and clean the doc Missing Documentation #166 (help is wanted)
- finishing to update the notebooks and the tutorial Update colab notebooks #7 (I will do that, only HER notebook missing)
What are the new features?
- much cleaner base code (and no more warnings =D )
- independent saving/loading/predict for policies
- State-Dependent Exploration (SDE) for using RL directly on real robots (this is a unique feature, it was the starting point of SB3, I published a paper on that: https://arxiv.org/abs/2005.05719)
- proper evaluation (using separate env) is included in the base class (using
EvalCallback
) - all environments are
VecEnv
- better saving/loading (now can include the replay buffer and the optimizers)
- any number of critics are allowed for SAC/TD3
- custom actor/critic net arch for off-policy algos ([Feature request] Allow different network architectures for off-policy actor/critic #113 )
- QR-DQN in SB3-Contrib
- Truncated Quantile Critics (TQC) (see Implement Truncated Quantile Critics (TQC) #83 ) in SB3-Contrib
- @Miffyli suggested a "contrib" repo for experimental features (it is here)
What is missing?
- syncing some files with Stable-Baselines to remain consistent (we may be good now, but need to be checked)
- finish code-review of exisiting code Review of Existing Code #17
Checklist for v1.0 release
- Update Readme
- Prepare blog post
- Update doc: add links to the stable-baselines3 contrib
- Update docker image to use newer Ubuntu version
- Populate RL zoo
What is next? (for V1.1+)
- basic dict/tuple support for observations (Dictionary Observations #243 )
- simple recurrent policies? (recurrent policy implementation in ppo [feature-request] #18)
- DQN extensions (double, PER, IQN) ([Feature Request] RAINBOW #622)
- Implement TRPO (Add TRPO Stable-Baselines-Team/stable-baselines3-contrib#40)
- multi-worker training for all algorithms ([Feature request] Adding multiprocessing support for off policy algorithms #179 )
- n-step returns for off-policy algorithms [feature-request] N-step returns for TD methods #47 (@partiallytyped )
- SAC discrete [Feature request] Implement SAC-Discrete #157 (need to be discussed, benefit vs DQN+extensions?)
- Energy Based Prioritisation? (@RyanRizzo96)
- implement
action_proba
in the base class? - test the doc snippets Sphinx doc tests support #14 (help is welcomed)
- noisy networks (https://arxiv.org/abs/1706.10295) @partiallytyped ? exploration in parameter space? ([Feature Request] RAINBOW #622)
- Munchausen Reinforcement Learning (MDQN) (probably in the contrib first, e.g. [WIP] MDQN pfnet/pfrl#74)
side note: should we change the default start_method
to fork
? (now that we don't have tf anymore)