Generic hidden state for RecurrentPPO #4

rhaps0dy · 2023-09-19T22:08:39Z

(This PR is after #7 and #8 , which I factored out to make review easier.)

Add the remaining parts for generic-state recurrent PPO.

The policy in common/recurrent/policies.py is still based on LSTMs, but now uses the recurrent_initial_state(...) interface to indicate what its hidden state is.
RecurrentPPO is fully generic over hidden state types.
Fixed all the type errors
- Also re-enabled mypy for the whole codebase

rhaps0dy · 2023-10-09T00:59:16Z

stable_baselines3/ppo_recurrent/ppo_recurrent.py

@@ -375,23 +371,21 @@ def train(self) -> None:
                    # Convert discrete action from float to long
                    actions = rollout_data.actions.long().flatten()

-                # Convert mask from float to bool
-                mask = rollout_data.mask > 1e-8
-


The rollout_data.mask is now already a bool

rhaps0dy · 2023-10-09T00:59:44Z

stable_baselines3/ppo_recurrent/ppo_recurrent.py

@@ -260,7 +257,7 @@ def collect_rollouts(  # type: ignore[override]

        callback.on_rollout_start()

-        lstm_states = deepcopy(self._last_lstm_states)
+        lstm_states = non_null(self._last_lstm_states)


It's not actually necessary to copy each tensor. They don't get overwritten.

rhaps0dy · 2023-10-09T01:01:10Z

.circleci/config.yml

@@ -9,7 +9,7 @@ parameters:
  docker_img_version:
    # Docker image version for running tests.
    type: string
-    default: "a0d53ea"
+    default: "03a594c"


This is a more recent image -- a few dependencies were added, though they don't impact this codebase (only learned-planners).

rhaps0dy · 2023-10-09T01:01:17Z

.circleci/config.yml

@@ -51,7 +51,7 @@ jobs:
          command: ruff .
      - run:
          name: Typecheck (mypy)
-          command: mypy --exclude '^stable_baselines3/common/recurrent/policies\.py$' stable_baselines3/common tests
+          command: mypy .


Start typechecking all the things again!

dan-pandori

Sorry for the delayed review, I am still only firing on like half of my cylinders (so to speak).

dan-pandori · 2023-10-12T20:18:28Z

stable_baselines3/common/recurrent/policies.py

+        "Get only the vf features, not advancing the hidden state"
+        if self.lstm_critic is None:
+            if self.shared_lstm:
+                with th.no_grad():


I would have thought that we need this with th.no_grad(): at the top level (ie, at line 257 and applying to all parts of this function).

In particular, I'm wondering if we might accidentally alter gradients on line 266 otherwise.

dan-pandori · 2023-10-12T20:21:24Z

stable_baselines3/ppo_recurrent/ppo_recurrent.py

+            buffer_size = self.env.num_envs * self.n_steps
+            assert buffer_size > 1 or (
+                not normalize_advantage
+            ), f"`n_steps * n_envs` must be greater than 1. Currently n_steps={self.n_steps} and n_envs={self.env.num_envs}"


f"`n_steps * n_envs` must be greater than 1 when `normalize_advantage` is true.

etc

rhaps0dy mentioned this pull request Sep 21, 2023

Port in torchified PPO from sb3_contrib #5

Merged

rhaps0dy changed the title ~~Port in torchified recurrent PPO from sb3_contrib~~ Make LSTM hidden state generic Sep 21, 2023

rhaps0dy changed the base branch from master to start-from-numpy September 22, 2023 00:06

rhaps0dy force-pushed the contrib-recurrent branch from 00ddbbd to 77508da Compare September 27, 2023 01:17

rhaps0dy changed the base branch from start-from-numpy to main October 7, 2023 00:28

rhaps0dy changed the base branch from main to generic-buffers October 9, 2023 00:50

rhaps0dy commented Oct 9, 2023

View reviewed changes

rhaps0dy changed the title ~~Make LSTM hidden state generic~~ Generic hidden state for RecurrentPPO and RecurrentActorCriticPolicy Oct 9, 2023

rhaps0dy requested a review from dan-pandori October 9, 2023 01:05

rhaps0dy changed the title ~~Generic hidden state for RecurrentPPO and RecurrentActorCriticPolicy~~ Generic hidden state for RecurrentPPO Oct 9, 2023

rhaps0dy marked this pull request as ready for review October 9, 2023 01:05

rhaps0dy mentioned this pull request Oct 10, 2023

Add Pytree-Dataclass utilities #7

Merged

dan-pandori approved these changes Oct 12, 2023

View reviewed changes

rhaps0dy added a commit that referenced this pull request Oct 13, 2023

Merge branch 'contrib-recurrent' #4

2ce1723

rhaps0dy added 3 commits October 13, 2023 10:13

DeMorgan's law to make ifs clearer

f10abdd

Remove conditionality from mcs.currently_registering

2280768

Apply things from pytree-dataclass PR to here

188b417

rhaps0dy changed the base branch from generic-buffers to main October 13, 2023 17:18

rhaps0dy merged commit c0ac130 into main Oct 13, 2023
0 of 3 checks passed

rhaps0dy deleted the contrib-recurrent branch October 13, 2023 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic hidden state for RecurrentPPO #4

Generic hidden state for RecurrentPPO #4

rhaps0dy commented Sep 19, 2023 •

edited

Loading

rhaps0dy Oct 9, 2023

rhaps0dy Oct 9, 2023

rhaps0dy Oct 9, 2023

rhaps0dy Oct 9, 2023

dan-pandori left a comment

dan-pandori Oct 12, 2023

dan-pandori Oct 12, 2023

Generic hidden state for RecurrentPPO #4

Generic hidden state for RecurrentPPO #4

Conversation

rhaps0dy commented Sep 19, 2023 • edited Loading

rhaps0dy Oct 9, 2023

Choose a reason for hiding this comment

rhaps0dy Oct 9, 2023

Choose a reason for hiding this comment

rhaps0dy Oct 9, 2023

Choose a reason for hiding this comment

rhaps0dy Oct 9, 2023

Choose a reason for hiding this comment

dan-pandori left a comment

Choose a reason for hiding this comment

dan-pandori Oct 12, 2023

Choose a reason for hiding this comment

dan-pandori Oct 12, 2023

Choose a reason for hiding this comment

rhaps0dy commented Sep 19, 2023 •

edited

Loading