Description
What happened + What you expected to happen
'merge_states' is used to aggregate the statistics collected over all the parallel environments, then broadcast the result back to all remote workers.
This design is problematic for 'MeanStdFilter', because it means that, after the first iteration, all the local workers store not only the statistic collected at the current iteration, but also all one that were aggregated at previous iteration. Only the statistics collected at the current iteration should be considered when merging, which is not the case at the moment.
This issue is blocking and not just nice-to-have because it causes the number of "pushes" to grow exponentially, causing int64 overflow after a few iterations (more precisely, 304 iterations with 80000 samples each). Then 'nan' appears everywhere after this point, which is not surprising.
Versions / Dependencies
I checked that the issue is present for "ray[rllib]>=2.38,<=2.40".
Reproduction script
Any script specifying algo_config.env_runners(env_to_module_connector=lambda env: MeanStdFilter())
would face this issue.
Issue Severity
High: It blocks me from completing my task.
Activity