-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix: Auto-increment seed across batch_run iterations #2841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Performance benchmarks:
|
|
I agree that this needs to be fixed. However, using subsequent integers with Mersenne Twister, Python's default RNG, is a bad idea. From wikipedia: "A consequence of poor diffusion is that two instances of the generator, started with initial states that are almost the same, will usually output nearly the same sequence for many iterations". Also, using a seed with many zeros (like 42) is actually bad as well. One option is to just use As an aside, numpy's rng is much better and I believe we should move all mesa code over to using this while deprecating the use of python's stdlib random library. |
|
Considering how important this is, maybe we should just go all in and do the switch to numpy and its rng and then have seed options like system time and hierarchical seeding? Does it have to be a breaking change? Could we keep the old behavior and just add a warning? |
Moving the internals over should be possible as a non-breaking change. |
mesa/batchrunner.py
Outdated
| seed_value = kwargs["seed"] | ||
| if isinstance(seed_value, (int, float)) and not isinstance(seed_value, bool): | ||
| kwargs = kwargs.copy() | ||
| kwargs["seed"] = int(seed_value) + iteration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| kwargs["seed"] = int(seed_value) + iteration | |
| kwargs["seed"] = seed_value + time.time() |
This is all that is needed to ensure a much better spread of seeding values and thus better randomness.
Considering this, do we want to move forward with this PR?
Where/how should we do this (without breaking API)? |
When using batch_run() with a single seed value and multiple iterations, all iterations were using the same seed, producing identical results instead of independent replications. This defeats the purpose of running multiple iterations. This commit modifies _model_run_func to automatically increment the seed for each iteration (seed, seed+1, seed+2, ...) when a numeric seed is provided. This ensures: - Each iteration produces different random outcomes - Results remain reproducible (same base seed → same sequence) - Backward compatibility with seed arrays (no modification if seed is already an iterable passed via parameters) - Unchanged behavior when no seed is specified (each iteration gets random seed from OS) The fix only applies when: 1. A 'seed' parameter exists in kwargs 2. The seed value is not None 3. The iteration number is > 0 4. The seed is a single numeric value (int/float, not bool)
for more information, see https://pre-commit.ci
Shifting to numpy rng requires changes in e.g.,
The return is Alternatively, you can keep stuff as is and just document the behavior. Or, perhaps even better: raise a ValueError if In my view, we might even consider deprecating |
|
@EwoutH, I have updated this PR as discussed yesterday. I added a new kewword argument While reviewing the code, I noticed the current use of |
EwoutH
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Since this is now a new, breaking feature, we should do it the proper way.
Could you add a section in the migration guide and link to it in the deprecationwarning? See #2872 for a recent example.
We also have to be careful about the order of our arguments. Some people use, positional arguments (stupid, I know), which we’re changing once we remove “iterations”.
Also we should properly explain this in the tutorial (can be a separate PR).
|
for more information, see https://pre-commit.ci
|
Good stuff. I was thinking (not for this PR), wouldn’t it be useful if the results wasn’t just a dataframe, but an object? |
I am not sure. With the workbench, I have never seen the need for anything other than a dataframe with the experiments and a dictionary with the outcome names as keys and a numpy array as values. It might be different, however, if you want to store agent-level data over time. Still, for those use cases, you don't want to keep everything in memory. |
|
@EwoutH, this is ready for review. I added the docs, migration guide, and fixed the tests. |
EwoutH
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and complete, thanks a lot. I have a few minor comments/suggestions.
| "RunId": 0, | ||
| "iteration": 0, | ||
| "Step": 1000, | ||
| "reported_model_param": 42, | ||
| "AgentID": 1, | ||
| "agent_id": 1, | ||
| "agent_local": 250.0, | ||
| "seed": 42, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a lot of duplicate information in this output format. Maybe we should clean this up (at some point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but it's beyond the scope of this PR.
| # establish to use seed or rng as name for parameter | ||
| model_parameters = inspect.signature(Model).parameters | ||
| rng_kwarg_name = "rng" | ||
| if "seed" in model_parameters: | ||
| rng_kwarg_name = "seed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior should be concisely explained in the batch_run docstring and tutorial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
Using both seed and rng on a model already gives an error. This just ensures that the seed is set to the keyword argument specified by the user in their model class, whether it is seed or rng.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, that's true. Give me a moment to get up to speed with it myself.
Co-authored-by: Ewout ter Hoeven <15776622+EwoutH@users.noreply.github.com>
Co-authored-by: Ewout ter Hoeven <15776622+EwoutH@users.noreply.github.com>
Co-authored-by: Ewout ter Hoeven <15776622+EwoutH@users.noreply.github.com>
|
Thanks, I think I can resolve the last open comments and merge. Can you make sure the PR description represents the final state of this PR? |
Problem
When using
batch_run()with a single seed value and multiple iterations, all iterations use the same seed, producing identical results instead of independent replications.See #2835.
Solution
Modify
_model_run_functo automatically increment the seed for each iteration: 42, 43, 44, etc.Behavior changes
seed=42, iterations=3: currently all use 42, now uses 42, 43, 44seed=[42, 43, 44], iterations=1: unchangedCode that passes a single seed with multiple iterations will get different results. The current behavior seems like a bug (why run multiple identical iterations?), but this technically breaks existing code.
Review
I'm in doubt about this. What if users change have other random elements in their model? Do we do good obscuring this?
Secondly, is this a bugfix or a breaking change? Should we treat it as a fix and merge, or wait for a major version?
Might close #2835. @dylan-munson curious what you think.