-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Clean up random seed code #3645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| docker_target_name: Optional[str], | ||
| no_graphics: bool, | ||
| seed: Optional[int], | ||
| seed: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is always non-None.
| worker_id: int, side_channels: List[SideChannel] | ||
| ) -> UnityEnvironment: | ||
| env_seed = seed | ||
| if not env_seed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would only happen if you set --seed=0 or 1/10000 times if this block
ml-agents/ml-agents/mlagents/trainers/learn.py
Lines 492 to 494 in 811825b
| if options.seed == -1: | |
| run_seed = np.random.randint(0, 10000) | |
| run_training(run_seed, options) |
picked 0 for the seed.
np.random's seed isn't set at this point, so if you set seed=0 you wouldn't get reproducible results.
| if not env_seed: | ||
| env_seed = seed_pool[worker_id % len(seed_pool)] | ||
| # Make sure that each environment gets a different seed | ||
| env_seed = seed + worker_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/
I don't know how to prove which would be better or worse here - the environments should diverge after a few steps, but seeding them the same feels strange for multiple environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @harperj did some experiments with Walker and distributed training here - don't remember if there was a huge improvement but I remember the same-seeding being an issue in terms of diversity of experiences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, actually this code has been problematic for a while but I just never got around to cleaning it up. In practice for most environments it is not an issue. I can't think of any reason we'd practically want all of the environments to be the same seed, so far we have just been fortunate that there are other sources of randomness which make it fine either way.
| - `worker_id` indicates which port to use for communication with the | ||
| environment. For use in parallel training regimes such as A3C. | ||
| - `seed` indicates the seed to use when generating random numbers during the | ||
| training process. In environments which do not involve physics calculations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"no physics" does not imply "deterministic"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We called out physics before though just to let people know that the physics is non-determinstic. So just because you don't use a Random call in your code, it doesn't mean you are guaranteed deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inverse of the statement is also not true: physics simulations are not inherently non-deterministic (although it sounds like Unity's implementation is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 😄
harperj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| if not env_seed: | ||
| env_seed = seed_pool[worker_id % len(seed_pool)] | ||
| # Make sure that each environment gets a different seed | ||
| env_seed = seed + worker_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, actually this code has been problematic for a while but I just never got around to cleaning it up. In practice for most environments it is not an issue. I can't think of any reason we'd practically want all of the environments to be the same seed, so far we have just been fortunate that there are other sources of randomness which make it fine either way.
Proposed change(s)
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
https://jira.unity3d.com/browse/MLA-766
https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/
Types of change(s)
Checklist
Other comments