-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Allow noop_max to be 0 and make ClipReward actually clip rewards #271
Comments
Hello,
What prevents you from re-defining
that does not include the NoopWrapper ?
I would agree we could simply avoid wrapping it if
Same here, what prevents you from defining a custom We usually keep those Atari wrappers as-is to have consistent comparison between algorithms. |
I concur with @araffin. I'd rather keep the current wrappers as is for consistency (despite the unclear naming "ClipReward"), but support for |
Hey, I would call these 2 logical bugs, because my expectation is that
I presume you mean that instead of line 238 there, I put in an
Here, I suppose you mean that I should define a new class, separate from
Do you mean consistency with OpenAI baseline's results? If so, they have also already allowed 🚀 FeatureIn atari_wrappers.py, allow Also in the same file, I propose we rename MotivationAllow Having ### Checklist
|
sounds good ;)
I meant defining a new class for your own purposes (not part of the PR).
Consistency with many codebases and results out there and also avoiding breaking changes.
that's what the docstring is meant for ;) (and looking at https://stable-baselines3.readthedocs.io/en/master/common/atari_wrappers.html I find it quite explicit)
btw, why would you do that? |
Done here.
You mean this link, right: https://stable-baselines3.readthedocs.io/en/master/common/atari_wrappers.html
I had read the docstring and I'd argue these things tend to get forgotten (or at the very least require extra bits in one's brain to remember ;)) when we're in the middle of debugging code 40 layers deep in the stack. And that's exactly what happened to me with another library that 'clipped'. One could also make a case that it's fine for docstrings also to be semantically unclear, since one can always read the code to know 'exactly' what's happening. And that's kind of how RL is at the moment in many instances, isn't it? ;)
We're developing a library where we try to inject different difficulties into environments to analyse / benchmark an agent's robustness to those. |
In atari_wrappers.py, allow
noop_max
to be0
to make it possible to have a completely deterministic Atari environment.Also in the same file, having
ClipRewardEnv
bin, and not clip rewards, causes problems when there's a small amount of reward noise in the environments, e.g., actual reward of0.0
on injecting noise changes to0.01
and this gets "clipped" to1.0
. I know that many libraries are implementing it like done here (i.e. binning), but is there a reason we can't change it here?I have a patched fork with the fix here. Shall I create a pull request?
The text was updated successfully, but these errors were encountered: