-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Fix Atari learning test regressions (2 bugs) and 1 minor attention net bug. #18306
[RLlib] Fix Atari learning test regressions (2 bugs) and 1 minor attention net bug. #18306
Conversation
if to_ == 0: | ||
to_ = None | ||
input_dict[view_col] = np.array([ | ||
np.concatenate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data
has to be last in the concat, otherwise, e.g. an attention net will not necessarily see the most recent observations. This explains the learning enhancements on the RepeatAfterMe experiments vs older versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
rllib/evaluation/rollout_worker.py
Outdated
@@ -486,15 +487,14 @@ def wrap(env): | |||
clip_rewards = True | |||
|
|||
# Deprecated way of framestacking is used. | |||
framestack = model_config.get("framestack") is True | |||
use_old_framestack = model_config.get("framestack") is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway to say this is deprecated in the logs?
This PR fixes:
config.num_framestacks = "auto"
would still use the old Atari framestacking logic. Oldframestack=True
soft-deprecated.batch_repeat_value>1
view requirements (attention nets!). A test on examples/attention_net.py confirmed that the fix makes learning considerably faster:Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.