[Bug] Allow noop_max to be 0 and make ClipReward actually clip rewards #271

RaghuSpaceRajan · 2020-12-23T23:23:09Z

In atari_wrappers.py, allow noop_max to be 0 to make it possible to have a completely deterministic Atari environment.

Also in the same file, having ClipRewardEnv bin, and not clip rewards, causes problems when there's a small amount of reward noise in the environments, e.g., actual reward of 0.0 on injecting noise changes to 0.01 and this gets "clipped" to 1.0. I know that many libraries are implementing it like done here (i.e. binning), but is there a reason we can't change it here?

I have a patched fork with the fix here. Shall I create a pull request?

The text was updated successfully, but these errors were encountered:

araffin · 2020-12-24T13:52:27Z

Hello,
what you described sounds more like a feature request than a bug, hence I would ask you to fill up the feature request template ;)

In atari_wrappers.py, allow noop_max to be 0 to make it possible to have a completely deterministic Atari environment.

What prevents you from re-defining

stable-baselines3/stable_baselines3/common/atari_wrappers.py

Line 207 in b8c72a5

class AtariWrapper(gym.Wrapper):

that does not include the NoopWrapper?

I would agree we could simply avoid wrapping it if noop_max=0 (I would accept such PR).

Also in the same file, having ClipRewardEnv bin, and not clip rewards, causes problems when there's a small amount of reward noise in the environments,

Same here, what prevents you from defining a custom gym.Wrapper (that you can pass to the make_vec_env function)?

We usually keep those Atari wrappers as-is to have consistent comparison between algorithms.

Miffyli · 2020-12-24T18:24:53Z

I concur with @araffin. I'd rather keep the current wrappers as is for consistency (despite the unclear naming "ClipReward"), but support for noop_max=0.

RaghuSpaceRajan · 2021-01-04T14:48:12Z

Hey,

I would call these 2 logical bugs, because my expectation is that noop_max can be 0 and that ClipReward matches its semantics and actually clips. Having said that, the feature request template is easier to fill ;), so I have filled it further below.

What prevents you from re-defining

stable-baselines3/stable_baselines3/common/atari_wrappers.py

Line 207 in b8c72a5

class AtariWrapper(gym.Wrapper):

that does not include the NoopWrapper?
I would agree we could simply avoid wrapping it if noop_max=0 (I would accept such PR).

I presume you mean that instead of line 238 there, I put in an if statement there that does not wrap with NoopResetEnv in case noop_max = 0. That does sound a better solution to me than what I proposed.
But by 're-define', do you mean I should define a new class, separate from AtariWrapper, or redefine AtariWrapper itself in the PR?

Same here, what prevents you from defining a custom gym.Wrapper (that you can pass to the make_vec_env function)?

Here, I suppose you mean that I should define a new class, separate from AtariWrapper and use it for my purposes. The reason I'd rather have it in the AtariWrapper is because it's going to be used much more frequently by users and I feel it is a logical bug. I faced problems in learning in the presence of reward noise because of the binning behaviour with another library and we're trying to move to using SB3 and everything seems cleaner and more accurate here, so I thought it'd be best to have it "correctly done" here. :) However, it is true that reward noise may not always be present and binning and clipping, both, can be useful. And I can understand why you'd want to keep the current behaviour for consistency.

We usually keep those Atari wrappers as-is to have consistent comparison between algorithms.
I'd rather keep the current wrappers as is for consistency (despite the unclear naming "ClipReward")

Do you mean consistency with OpenAI baseline's results? If so, they have also already allowed noop_max = 0 in OpenAI Gym. Regarding ClipReward, I can understand if you want to keep the binning behaviour, but then shouldn't we at least rename ClipReward to BinReward or even better, SignumReward in keeping with the goal of Stable Baselines to be more stable and, also, I presume to be more semantically correct and easier to understand and use?

🚀 Feature

In atari_wrappers.py, allow noop_max to be 0.

Also in the same file, I propose we rename ClipRewardEnv to SignumRewardEnv and define ClipRewardEnv with the semantically correct code.

Motivation

Allow noop_max to be 0 to make it possible to have a completely deterministic Atari environment. This is needed to have vanilla Atari environments which is useful to see agents' performances in the absence of confounding factors.

Having ClipRewardEnv bin, and not clip rewards, is misleading semantically.

### Checklist

I have checked that there is no similar issue in the repo

araffin · 2021-01-06T10:58:12Z

I presume you mean that instead of line 238 there, I put in an if statement there that does not wrap with NoopResetEnv in case noop_max = 0. That does sound a better solution to me than what I proposed.

sounds good ;)

But by 're-define', do you mean I should define a new class, separate from AtariWrapper, or redefine AtariWrapper itself in the PR?

I meant defining a new class for your own purposes (not part of the PR).

because it's going to be used much more frequently by users and I feel it is a logical bug.

AtariWrapper is usually meant to be used "as-is" in order to have comparable results (we have a link in the doc that describe atari preprocessing in details).

Do you mean consistency with OpenAI baseline's results?

Consistency with many codebases and results out there and also avoiding breaking changes.
I would keep ClipRewardEnv name.

I presume to be more semantically correct and easier to understand and use?

that's what the docstring is meant for ;) (and looking at https://stable-baselines3.readthedocs.io/en/master/common/atari_wrappers.html I find it quite explicit)

small amount of reward noise in the environments

btw, why would you do that?

RaghuSpaceRajan · 2021-01-06T14:08:41Z

sounds good ;)

Done here.

we have a link in the doc that describe atari preprocessing in details

You mean this link, right: https://stable-baselines3.readthedocs.io/en/master/common/atari_wrappers.html
I have read those and it seemed most similar to the OpenAI Gym Atari wrapper and so I assumed comparability was meant with OpenAI Baselines. However, I have also seen other Atari wrappers which are doing things a bit differently and given how variable RL is and some of your tweets ;) was not sure how comparable the others are.

that's what the docstring is meant for ;) (and looking at https://stable-baselines3.readthedocs.io/en/master/common/atari_wrappers.html I find it quite explicit)

I had read the docstring and I'd argue these things tend to get forgotten (or at the very least require extra bits in one's brain to remember ;)) when we're in the middle of debugging code 40 layers deep in the stack. And that's exactly what happened to me with another library that 'clipped'. One could also make a case that it's fine for docstrings also to be semantically unclear, since one can always read the code to know 'exactly' what's happening. And that's kind of how RL is at the moment in many instances, isn't it? ;)

small amount of reward noise in the environments

btw, why would you do that?

We're developing a library where we try to inject different difficulties into environments to analyse / benchmark an agent's robustness to those.

RaghuSpaceRajan added the bug Something isn't working label Dec 23, 2020

araffin added enhancement New feature or request and removed bug Something isn't working labels Dec 24, 2020

araffin mentioned this issue Apr 19, 2022

Add Gym 0.26 support #780

Closed

19 tasks

araffin mentioned this issue Jan 20, 2023

Add sticky actions for Atari games #1286

Merged

16 tasks

qgallouedec closed this as completed in #1286 Jan 26, 2023

araffin mentioned this issue Mar 8, 2023

Add Gymnasium support #1327

Merged

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Allow noop_max to be 0 and make ClipReward actually clip rewards #271

[Bug] Allow noop_max to be 0 and make ClipReward actually clip rewards #271

RaghuSpaceRajan commented Dec 23, 2020

araffin commented Dec 24, 2020

Miffyli commented Dec 24, 2020

RaghuSpaceRajan commented Jan 4, 2021

araffin commented Jan 6, 2021

RaghuSpaceRajan commented Jan 6, 2021

[Bug] Allow noop_max to be 0 and make ClipReward actually clip rewards #271

[Bug] Allow noop_max to be 0 and make ClipReward actually clip rewards #271

Comments

RaghuSpaceRajan commented Dec 23, 2020

araffin commented Dec 24, 2020

Miffyli commented Dec 24, 2020

RaghuSpaceRajan commented Jan 4, 2021

🚀 Feature

Motivation

araffin commented Jan 6, 2021

RaghuSpaceRajan commented Jan 6, 2021