Skip to content

Conversation

@chriselion
Copy link
Contributor

@chriselion chriselion commented Mar 16, 2020

Proposed change(s)

  • Remove some "seed pool" code that was (almost) unreachable
  • Change how seeds as determined for multiple environments.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

https://jira.unity3d.com/browse/MLA-766
https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

docker_target_name: Optional[str],
no_graphics: bool,
seed: Optional[int],
seed: int,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always non-None.

worker_id: int, side_channels: List[SideChannel]
) -> UnityEnvironment:
env_seed = seed
if not env_seed:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only happen if you set --seed=0 or 1/10000 times if this block

if options.seed == -1:
run_seed = np.random.randint(0, 10000)
run_training(run_seed, options)

picked 0 for the seed.

np.random's seed isn't set at this point, so if you set seed=0 you wouldn't get reproducible results.

if not env_seed:
env_seed = seed_pool[worker_id % len(seed_pool)]
# Make sure that each environment gets a different seed
env_seed = seed + worker_id
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/

I don't know how to prove which would be better or worse here - the environments should diverge after a few steps, but seeding them the same feels strange for multiple environments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @harperj did some experiments with Walker and distributed training here - don't remember if there was a huge improvement but I remember the same-seeding being an issue in terms of diversity of experiences

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually this code has been problematic for a while but I just never got around to cleaning it up. In practice for most environments it is not an issue. I can't think of any reason we'd practically want all of the environments to be the same seed, so far we have just been fortunate that there are other sources of randomness which make it fine either way.

- `worker_id` indicates which port to use for communication with the
environment. For use in parallel training regimes such as A3C.
- `seed` indicates the seed to use when generating random numbers during the
training process. In environments which do not involve physics calculations,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"no physics" does not imply "deterministic"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. We called out physics before though just to let people know that the physics is non-determinstic. So just because you don't use a Random call in your code, it doesn't mean you are guaranteed deterministic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inverse of the statement is also not true: physics simulations are not inherently non-deterministic (although it sounds like Unity's implementation is).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 😄

@chriselion chriselion requested a review from harperj March 17, 2020 00:16
Copy link
Contributor

@harperj harperj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if not env_seed:
env_seed = seed_pool[worker_id % len(seed_pool)]
# Make sure that each environment gets a different seed
env_seed = seed + worker_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually this code has been problematic for a while but I just never got around to cleaning it up. In practice for most environments it is not an issue. I can't think of any reason we'd practically want all of the environments to be the same seed, so far we have just been fortunate that there are other sources of randomness which make it fine either way.

@chriselion chriselion merged commit 2625ba5 into master Mar 17, 2020
@delete-merged-branch delete-merged-branch bot deleted the develop-MLA-766-random-seeds branch March 17, 2020 22:41
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants