Skip to content

Local Random Streams

esseff edited this page Jan 26, 2023 · 80 revisions

Home > Model Development Topics > Local Random Streams

The local_random_streams option implements distinct random number generator streams for individual entities. This can help maintain run coherence for models which simulate multiple entities together, but requires additional memory and a unique entity key.

Related topics

Topic contents

Background

Model code can draw random values from selected statistical distributions using built-in random number generator (RNG) functions, for example:

    double x = RandUniform(1);
    double y = RandNormal(2);
    double z = RandPoisson(3);

These functions return pseudo-random streams of numbers. The streams appear random but are actually produced by a deterministic algorithm which generates a fixed sequence of values. That algorithm knows which value to return next by maintaining an internal state which changes from one function call to the next.

The sequence of numbers returned depends on the SimulationSeed for the run, on the run member (aka sub, replicate), and on the case for case-based models.

The small integer argument to an RNG function specifies a distinct underlying random number stream which produces values independent of those produced by other random number streams. This avoids spurious interactions among unrelated random processes in the model. For example, values returned by calling RandUniform(4) in a Fertility module will not affect values returned by calling RandUniform(6) in a Migration module.

Independent random number streams can reduce statistical noise in the difference of two model runs, reducing the run size needed to obtain reliable results for run differences. They also make microdata comparisons of two runs correspond better with model logic. For example, if there is no logical dependence between Fertility and Migration in the model, changing a Fertility parameter should not, logically, affect Migration. Had the same random stream, e.g. RandUniform(4) been used in both Fertility and Migration, a call to RandUniform(4) in Fertility would affect the value returned in a subsequent call to RandUniform(4) in Migration. That would produce a spurious (but statistically neutral) interaction between Fertility and Migration. That's avoided by using a different random stream in Migration, e.g. by calling RandUniform(6) to specify stream 6 rather than stream 4. Spurious correlation of random number streams can be avoided by using a distinct random number stream in each call to an RNG function throughout model code.

However, a model which simulates multiple instances of an entity kind together, e.g. multiple Person entities, could have spurious interactions of random streams among those entities. For example, a call to RandUniform(4) in Fertility in Person A will affect the result from a subsequent call in Fertility to RandUniform(4) in Person B, because the same random stream 4 is used in both. In a time-based model with many entities, a spurious interaction could extend from one entity to the entire population. Such spurious interactions do not affect the statistical validity of aggregate model results, but they can create additional statistical noise in run comparisons, and produce differences at the microdata level which are not explained by model logic.

This issue can be resolved by maintaining independent local random streams in each entity, rather than using global random streams shared among the entities which are simulated together. For example, using local random streams, a call to RandUniform(4) in Person A uses a different random stream from a call to RandUniform(4) in Person B. Local random streams require additional memory in each entity to maintain the state of the pseudo-random number generator for each stream. This additional memory can be significant for time-based models with many entities and many random streams. Local random streams also require distinct initialization in each entity, so that different entities produce different random streams. That requirement is met by providing a function get_entity_key() which returns a unique key for each entity. The entity key is used to initialize local random streams independently in each each entity before it enters the simulation. The entity key needs to be stable from one run to another so that the local random streams are the same for the same entity in two different runs. The implementation of get_entity_key is, in general, model dependent.

Given these trades, local random streams are not implemented by default in OpenM++. Instead, a statement like

options local_entity_streams = Person;

causes OpenM++ to implement local random streams for the specified entity.

[back to topic contents]

Syntax and Use

under construction

options local_random_streams = Host; options local_random_streams = Ticker;

Multiple statements allowed, one for each entity for which local streams are desired.

During model build, a message like

Entity 'Host' has 11 local random streams, of which 1 are Normal

will be issued for each entity with local random streams.

If an entity with local RNG streams calls RandUniform, RandNormal, or RandLogistic to initialize attributes before it enters the simulation, e.g. in a Start function, the built-in function initialize_local_random_streams() must be called first. The function initialize_local_random_streams() calls get_entity_key(), so be sure that any attributes used by get_entity_key() have been assigned first.

Otherwise, a run-time error like

Simulation error: RandUniform called with uninitialized local random streams.

If there are no RNG calls before the entity enters the simulation, it is not necessary to call initialize_local_random_streams() when initializing the entity.

Model code can call initialize_local_random_streams even if the entity has no local RNG streams (no effect).

Normal behaviour of random streams in PreSimulation, and in Simulation (to create a starting population, for example, as in IDMM). Normal behaviour of random streams in other entities which were not named using the local_random_streams option.

For entities named in local_random_streams, the streams used in the entity are maintained at the entity level.

Streams are seeded using the value returned by get_entity_key(), combined with the run member (aka sub, replicate) and either the run seed for a time-based model or the case seed for a case-based model.

[back to topic contents]

Illustrative Example

under construction

This example is divided into the following sections:

Illustrative example sections

[back to topic contents]

Summary

This example illustrates the effect of local random streams vs. global random streams on simulation decoherence. It uses the time-based IDMM model with minor modifications. Microdata is output at 100 time points during the simulation, and later merged and compared between Base and Variant runs to measure how decoherence evolves as the simulation progresses.

Four runs are used in this example:

  1. Base run with global random streams
  2. Variant run with global random streams
  3. Base run with local random streams
  4. Variant run with local random streams

The 4 runs are very similar. All 4 runs have the same number of hosts and an identical contact network. A single parameter differs between Variant and Base. The change in that parameter causes two entities to differ at the start of the Variant simulation.

Base and Variant runs The Variant runs

[back to illustrative example]
[back to topic contents]

IDMM overview

IDMM simulates an interacting dynamic contact network of Host entities, together with a disease which can be transmitted over that contact network. The contact network is initialized randomly at the start of the simulation. During the simulation, each Host interacts with other Hosts during a contact event. Each Host can its connected Hosts in a contact change event. Optionally, a Host can change a connected Host in a contact event, if that host is infectious.

During a contact event, the disease can propagate between the two Hosts, depending on the disease status of each. An infected Host progresses through 4 disease phases of fixed duration: susceptible, latent, infectious, immune. On infection, the Host enters the latent phase, during which it is both asymptomatic and non-infectious. After the latent phase, the Host enters an infectious phase during which it can infect another Host during a contact event. After the infectious phase, the Host enters an immune phase. After the immune phase, the Host returns to the susceptible state.

Before the simulation starts, all Host entities are in the susceptible state. At the beginning of the simulation, a portion of the Host population is randomly infected.

For this example, some mechanical changes were made to the version of IDMM in the OpenM++ distribution.

[back to illustrative example]
[back to topic contents]

Base run

The Base run consists of 5,000 Hosts simulated for 100 time units, with the initial probability of infection set to 0.1000. The ini file for the Base run looks like this:

[OpenM]
SubValues = 1
Threads = 1
RunName = Base

[Parameter]
NumberOfHosts = 5000
SimulationEnd = 100
InitialDiseasePrevalence = 0.1000

[Microdata]
ToDb = yes
Host = report_time, disease_phase, age_infected

501 Hosts are infected at the beginning of the simulation in the Base run.

The time evolution of the susceptible population in Run 1 (Base with global random streams) looks like this:

Susceptible Hosts by time, Base (global)

The same chart for Run 3 (Base with local random streams) looks like this:

Susceptible Hosts by time, Base (local RNG)

Variant run

The Variant run is the same as the Base run, except for a very slightly higher probability of initial infection of 0.1001 compared to 0.1000 in Base.

The ini file for the Variant run looks like this:

[OpenM]
SubValues = 1
Threads = 1
RunName = Variant

[Parameter]
NumberOfHosts = 5000
SimulationEnd = 100
InitialDiseasePrevalence = 0.1001

[Microdata]
ToDb = yes
Host = report_time, disease_phase, age_infected

503 Hosts are infected at the beginning of the simulation in the Variant run. That's 2 more than in the Base run.

The time evolution of the susceptible population in Run 2 (Variant with global random streams) looks like this:

Susceptible Hosts by time, Variant (global)

The time evolution of the susceptible population in Run 4 (Variant with local random streams) looks like this:

Susceptible Hosts by time, Variant (local)

[back to illustrative example]
[back to topic contents]

Base-Variant coherence

The time evolution of coherence between Base and Variant runs with global random streams (runs 1 and 2) looks like this:

Base-Variant Coherence by time (global RNG)

The plateau in coherence count at the beginning of the chart is actually 4998 which is too close to 5000 to see in the chart. As described above, only 2 Hosts differ between Base and Variant at the beginning of the runs.

The time evolution of coherence between Base and Variant runs with local random streams (runs 3 and 4) looks like this:

Base-Variant Coherence by time (local RNG)

[back to illustrative example]
[back to topic contents]

IDMM differences

under construction

Refer to coherence example using IDMM example in Microdata Output

age_infected is the age of the Host at the most recent infection, and is initialized to -1. age_infected was added to IDMM to measure decoherence between runs. It turned out that disease_phase did not work well to measure decoherence between runs, because

A custom version of `get_microdata_key() was added to produce a unique microdata key. A microdata record is output at each time unit to measure the evolution of coherence between two runs during the simulation.

  • Similar example here, but using two runs Run1 and Run2 of IDMM, with InitialDiseasePrevalence very slightly higher to generate at least one (but very few) additional infected Hopsts at the beginning of the simulation.
  • Use Microdata Output to show decoherence in disease phase at end of runs
  • Perhaps, measure coherence between two runs over time, by output microdata at time steps.
  • Turn on local rng for Host entities, repeat Run1 and Run2
  • Use Microdata Output to show coherence in disease phase at end of runs (or evolution over time).

[back to illustrative example]
[back to topic contents]

Home

Getting Started

Model development in OpenM++

Using OpenM++

Model Development Topics

OpenM++ web-service: API and cloud setup

Using OpenM++ from Python and R

Docker

OpenM++ Development

OpenM++ Design, Roadmap and Status

OpenM++ web-service API

GET Model Metadata

GET Model Extras

GET Model Run results metadata

GET Model Workset metadata: set of input parameters

Read Parameters, Output Tables or Microdata values

GET Parameters, Output Tables or Microdata values

GET Parameters, Output Tables or Microdata as CSV

GET Modeling Task metadata and task run history

Update Model Profile: set of key-value options

Update Model Workset: set of input parameters

Update Model Runs

Update Modeling Tasks

Run Models: run models and monitor progress

Download model, model run results or input parameters

Upload model runs or worksets (input scenarios)

Download and upload user files

User: manage user settings

Model run jobs and service state

Administrative: manage web-service state

Clone this wiki locally