Skip to content
This repository was archived by the owner on Mar 15, 2022. It is now read-only.
This repository was archived by the owner on Mar 15, 2022. It is now read-only.

Seeding behaviour issue #408

@obriente

Description

@obriente

So I just saw the seeding behaviour in _study.py, e.g. at:

save_x_vals, seeds[i] if seeds is not None else

the offending statement being the line:

seeds[i] if seeds is not None else numpy.random.randint(2**16)

The issue with this line is that if a user runs multiple copies of their code in parallel on a cluster, these often get initially seeded when numpy is imported by the system time, which can result in their internal random number generators being identical. This propagates through to any future seeding using these internal random number generators, and winds up giving correlated data that the user doesn't expect.

As we give the user the option to specify their own seeds, they can definitely circumvent the issue themselves, but if they don't know about the problem, this becomes a notoriously hard error to find and debug, as it usually only presents as seemingly super random correlations / signal noise being larger than expected (and it doesn't replicate easily).

Also, I'm slightly worried that passing around seeds and updating numpy.random with them can lead to some really funky behaviour if numpy is called separately in two files (at least, I've observed this in the past) - namely, that there can be multiple internal rngs hiding behind the scenes.

I don't know if there's a 'standard' method for fixing this, but I have two suggestions: firstly, I would suggest adding a warning whenever we need to seed a rng and the user doesn't provide a seed to use. Secondly, I would suggest passing explicit numpy.random.RandomStates around instead of seeds for numpy.random, as this makes it easier to keep track of what rngs we actually have.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions