Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/Python-API.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ env = UnityEnvironment(file_name="3DBall", base_port=5005, seed=1, side_channels
- `worker_id` indicates which port to use for communication with the
environment. For use in parallel training regimes such as A3C.
- `seed` indicates the seed to use when generating random numbers during the
training process. In environments which do not involve physics calculations,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"no physics" does not imply "deterministic"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. We called out physics before though just to let people know that the physics is non-determinstic. So just because you don't use a Random call in your code, it doesn't mean you are guaranteed deterministic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inverse of the statement is also not true: physics simulations are not inherently non-deterministic (although it sounds like Unity's implementation is).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 😄

training process. In environments which are deterministic,
setting the seed enables reproducible experimentation by ensuring that the
environment and trainers utilize the same random seed.
- `side_channels` provides a way to exchange data with the Unity simulation that
Expand Down
9 changes: 3 additions & 6 deletions ml-agents/mlagents/trainers/learn.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@ def create_environment_factory(
env_path: Optional[str],
docker_target_name: Optional[str],
no_graphics: bool,
seed: Optional[int],
seed: int,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always non-None.

start_port: int,
env_args: Optional[List[str]],
) -> Callable[[int, List[SideChannel]], BaseEnv]:
Expand All @@ -426,15 +426,12 @@ def create_environment_factory(
# container.
# Navigate in docker path and find env_path and copy it.
env_path = prepare_for_docker_run(docker_target_name, env_path)
seed_count = 10000
seed_pool = [np.random.randint(0, seed_count) for _ in range(seed_count)]

def create_unity_environment(
worker_id: int, side_channels: List[SideChannel]
) -> UnityEnvironment:
env_seed = seed
if not env_seed:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only happen if you set --seed=0 or 1/10000 times if this block

if options.seed == -1:
run_seed = np.random.randint(0, 10000)
run_training(run_seed, options)

picked 0 for the seed.

np.random's seed isn't set at this point, so if you set seed=0 you wouldn't get reproducible results.

env_seed = seed_pool[worker_id % len(seed_pool)]
# Make sure that each environment gets a different seed
env_seed = seed + worker_id
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/

I don't know how to prove which would be better or worse here - the environments should diverge after a few steps, but seeding them the same feels strange for multiple environments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @harperj did some experiments with Walker and distributed training here - don't remember if there was a huge improvement but I remember the same-seeding being an issue in terms of diversity of experiences

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually this code has been problematic for a while but I just never got around to cleaning it up. In practice for most environments it is not an issue. I can't think of any reason we'd practically want all of the environments to be the same seed, so far we have just been fortunate that there are other sources of randomness which make it fine either way.

return UnityEnvironment(
file_name=env_path,
worker_id=worker_id,
Expand Down