[RLlib] Hot fix for PPOTorchRLModule._compute_values
with non-shared stateful encoder and batch slicing with non-empty info
s.
#44082
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Running
PPO
withuse_lstm=True
andvf_share_layers=False
results in an error in thePPOTorchRLModule._compute_values
method as the specs checker expects a different spec for thestate_in
:Exctracting the
state_in
for thecritic
solves this problem.Another problem is solved related to non-empty infos in batch slicing (mainly occuring in
MinibatchIterator
s). The reason is that slicing viatree.map_structure
tries to slice also the entries of theinfo
s which are usually singular values:Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.