Skip to content

Commit

Permalink
Release v1.6.0 (#958)
Browse files Browse the repository at this point in the history
* Release v1.6.0 + update doc + add copy button

* Update read the doc conda env

* Update year

* Fix bug in kl divergence check

* Rephrase requirement for envpool and isaac gym
  • Loading branch information
araffin authored Jul 12, 2022
1 parent ef10189 commit c1f1c3d
Show file tree
Hide file tree
Showing 10 changed files with 52 additions and 9 deletions.
6 changes: 3 additions & 3 deletions docs/conda_env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ dependencies:
- cpuonly=1.0=0
- pip=21.1
- python=3.7
- pytorch=1.8.1=py3.7_cpu_0
- pytorch=1.11=py3.7_cpu_0
- pip:
- gym>=0.17.2
- gym==0.21
- cloudpickle
- opencv-python-headless
- pandas
- numpy
- matplotlib
- sphinx_autodoc_typehints
- sphinx>=4.2
# See https://github.com/readthedocs/sphinx_rtd_theme/issues/1115
- sphinx_rtd_theme>=1.0
- sphinx_copybutton
13 changes: 12 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@
except ImportError:
enable_spell_check = False

# Try to enable copy button
try:
import sphinx_copybutton # noqa: F401

enable_copy_button = True
except ImportError:
enable_copy_button = False

# source code directory, relative to this file, for sphinx-autobuild
sys.path.insert(0, os.path.abspath(".."))

Expand Down Expand Up @@ -51,7 +59,7 @@ def __getattr__(cls, name):
# -- Project information -----------------------------------------------------

project = "Stable Baselines3"
copyright = "2020, Stable Baselines3"
copyright = "2022, Stable Baselines3"
author = "Stable Baselines3 Contributors"

# The short X.Y version
Expand Down Expand Up @@ -83,6 +91,9 @@ def __getattr__(cls, name):
if enable_spell_check:
extensions.append("sphinxcontrib.spelling")

if enable_copy_button:
extensions.append("sphinx_copybutton")

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

Expand Down
10 changes: 10 additions & 0 deletions docs/guide/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -729,6 +729,16 @@ to keep track of the agent progress.
model.learn(10_000)
SB3 with EnvPool or Isaac Gym
-----------------------------

Just like Procgen (see above), `EnvPool <https://github.com/sail-sg/envpool>`_ and `Isaac Gym <https://github.com/NVIDIA-Omniverse/IsaacGymEnvs>`_ accelerate the environment by
already providing a vectorized implementation.

To use SB3 with those tools, you must wrap the env with tool's specific ``VecEnvWrapper`` that will pre-process the data for SB3,
you can find links to those wrappers in `issue #772 <https://github.com/DLR-RM/stable-baselines3/issues/772#issuecomment-1048657002>`_.


Record a Video
--------------

Expand Down
11 changes: 11 additions & 0 deletions docs/guide/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,17 @@ Bleeding-edge version
pip install git+https://github.com/DLR-RM/stable-baselines3
.. note::

If you want to use latest gym version (0.24+), you have to use

.. code-block:: bash
pip install git+https://github.com/carlosluis/stable-baselines3/tree/fix_tests
See `PR #780 <https://github.com/DLR-RM/stable-baselines3/pull/780>`_ for more information.


Development version
-------------------

Expand Down
7 changes: 6 additions & 1 deletion docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ Changelog
==========


Release 1.5.1a9 (WIP)
Release 1.6.0 (2022-07-11)
---------------------------

**Recurrent PPO (PPO LSTM), better defaults for learning from pixels with SAC/TD3**

Breaking Changes:
^^^^^^^^^^^^^^^^^
- Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former
Expand Down Expand Up @@ -34,6 +36,7 @@ Bug Fixes:
- Fixed issues due to newer version of protobuf (tensorboard) and sphinx
- Fix exception causes all over the codebase (@cool-RR)
- Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination due to a bug (@MWeltevrede)
- Fixed a bug in ``kl_divergence`` check that would fail when using numpy arrays with MultiCategorical distribution

Deprecations:
^^^^^^^^^^^^^
Expand All @@ -51,6 +54,8 @@ Documentation:
- Added remark about breaking Markov assumption and timeout handling
- Added doc about MLFlow integration via custom logger (@git-thor)
- Updated Huggingface integration doc
- Added copy button for code snippets
- Added doc about EnvPool and Isaac Gym support


Release 1.5.0 (2022-03-25)
Expand Down
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,8 @@
"sphinxcontrib.spelling",
# Type hints support
"sphinx-autodoc-typehints",
# Copy button for code snippets
"sphinx_copybutton",
],
"extra": [
# For render
Expand Down
3 changes: 2 additions & 1 deletion stable_baselines3/common/buffers.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,8 @@ def __init__(
# see https://github.com/DLR-RM/stable-baselines3/issues/934
if optimize_memory_usage and handle_timeout_termination:
raise ValueError(
"ReplayBuffer does not support optimize_memory_usage = True and handle_timeout_termination = True simultaneously."
"ReplayBuffer does not support optimize_memory_usage = True "
"and handle_timeout_termination = True simultaneously."
)
self.optimize_memory_usage = optimize_memory_usage

Expand Down
3 changes: 2 additions & 1 deletion stable_baselines3/common/distributions.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from typing import Any, Dict, List, Optional, Tuple, Union

import gym
import numpy as np
import torch as th
from gym import spaces
from torch import nn
Expand Down Expand Up @@ -688,7 +689,7 @@ def kl_divergence(dist_true: Distribution, dist_pred: Distribution) -> th.Tensor
# MultiCategoricalDistribution is not a PyTorch Distribution subclass
# so we need to implement it ourselves!
if isinstance(dist_pred, MultiCategoricalDistribution):
assert dist_pred.action_dims == dist_true.action_dims, "Error: distributions must have the same input space"
assert np.allclose(dist_pred.action_dims, dist_true.action_dims), "Error: distributions must have the same input space"
return th.stack(
[th.distributions.kl_divergence(p, q) for p, q in zip(dist_true.distribution, dist_pred.distribution)],
dim=1,
Expand Down
2 changes: 1 addition & 1 deletion stable_baselines3/version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.5.1a9
1.6.0
4 changes: 3 additions & 1 deletion tests/test_distributions.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,9 @@ def test_categorical(dist, CAT_ACTIONS):
BernoulliDistribution(N_ACTIONS).proba_distribution(th.rand(N_ACTIONS)),
CategoricalDistribution(N_ACTIONS).proba_distribution(th.rand(N_ACTIONS)),
DiagGaussianDistribution(N_ACTIONS).proba_distribution(th.rand(N_ACTIONS), th.rand(N_ACTIONS)),
MultiCategoricalDistribution([N_ACTIONS, N_ACTIONS]).proba_distribution(th.rand(1, sum([N_ACTIONS, N_ACTIONS]))),
MultiCategoricalDistribution(np.array([N_ACTIONS, N_ACTIONS])).proba_distribution(
th.rand(1, sum([N_ACTIONS, N_ACTIONS]))
),
SquashedDiagGaussianDistribution(N_ACTIONS).proba_distribution(th.rand(N_ACTIONS), th.rand(N_ACTIONS)),
StateDependentNoiseDistribution(N_ACTIONS).proba_distribution(
th.rand(N_ACTIONS), th.rand([N_ACTIONS, N_ACTIONS]), th.rand([N_ACTIONS, N_ACTIONS])
Expand Down

0 comments on commit c1f1c3d

Please sign in to comment.