Weighted Tabulation

Home > Model Development Topics > Weighted Tabulation

For case-based models, the weighted_tabulation option creates, for each entity, the built-in attribute entity_weight which scales the entity's contribution to tables.

Topic contents

Introduction and Background
Syntax and Use How to activate and use
Limitations Limitations of the current implementation
Modgen-specific Modgen issues Weighted tabulation in a x-compatible model

Introduction and Background

Some case-based microsimulation models use micro-data directly from a survey, census, or administrative source. Micro-data from such sources often has a weight associated with each observation which reflects the sampling design and maybe post-stratification or under-count adjustment. Case weights can also be useful in microsimulation models which are not based on micro-data. Such models instead generate cases synthetically from multivariate distributions. They may deliberately over-sample portions of the synthetic population of particular interest, and then adjust for that oversampling by assigning a case weight equal to the reciprocal of the oversampling factor.

OpenM++ contains optional functionality to associate a weight with each entity. That weight scales the contribution of the entity to table counts and sums. The functionality facilitates assigning the same case weight to all of the entities in a case for table coherence. This is important for models which have multiple entities in each case, e.g. ancillary family members of a core entity which may be created later in the simulation of the case. The design integrates with population scaling by computing and using the sum of case weights.

A time-based microsimulation model simulates interacting entities. It is unclear how one might validly represent an interaction of entities which have non-equal weights. Instead, for time-based models based on weighted micro-data, a micro-data record is typically cloned or sampled based on its weight to produce a starting population of entities whose weights are all equal. Such an equal-weighted population can represent a real-world population of a different size by using population scaling, rather than by assigning a weight to each entity with all weights being equal. The end result is the same, but population scaling is more efficient for memory and computation compared to identical entity weights. Also, it is not clear how to implement population scaling in a time-based model with entity weights if the model contains entities of different types, e.g. a single special Ticker entity, or multiple Dwelling, Person, and Family entities, or a fixed number of Region entities. For these reasons, entity weights are forbidden in time-based models in OpenM++. Use population scaling to make a time-based model represent a real population of a different size. See Population Size and Scaling for more information.

[back to topic contents]

Syntax and Use

By default, entities are unweighted. To activate entity weights, include the statement

options weighted_tabulation = on;

in the source code of a case-based model. A natural place to insert this statement is the module ompp_framework.ompp. If weighting is turned on in a time-based model, an error message like the following is emitted:

error : weighted tabulation is not allowed with a time-based model, use population scaling instead.

There may be situations where it makes sense for a time-based model to have entity weights. For example, a model with a non-interacting population of weighted entities might be time-based to implement dynamic calibration to exogenous control totals. To override the default behaviour and allow weighted tabulation for a time-based model, add the following statement (as well as the above statement):

options weighted_tabulation_allow_time_based = on;

When weighting is turned on, each entity has a new built-in attribute named entity_weight, of type double. Usually model code does not assign a value directly to entity_weight. Instead, before entities are created for a case, model code sets the initial value of entity_weight for all entities in the case by calling the function set_initial_weight, as in the following contrived example:

void CaseSimulation(case_info &ci)
{
    extern void SimulateEvents(); // defined in a simulation framework module

    // Provide the weight used to initialize the entity_weight attribute for new entities
    set_initial_weight(2.0);

    // For Modgen-compatible models, use the following instead
    //SetCaseWeight(2.0);

    // Initialize the person entity
    auto prPerson = new Person();
    prPerson->Start();

    // Simulate events until there are no more.
    SimulateEvents();
}

Calling set_initial_weight before creating any entities in the case ensures that the built-in attribute entity_weight will have that same value for all entities in the case. The call to set_initial_weight also enables the calculation of the sum of case weights. That sum of weights is used to correctly scale the population to a specified size if the model uses both weights and population scaling. For that to work correctly, set_initial_weight must be called once and only once in the logic of the case, before any entities in the case are created.

If weighted tabulation is not enabled, entities have no attribute named entity_weight, and calls to set_initial_weight have no effect (but are benign).

If weighted tabulation is enabled, but set_initial_weight is not called before creating entities in the case, the entity_weight attribute will be 1.0. However, the total sum of weights used for population scaling will be incorrect because the calculation depends internally on the call to set_initial_weight. Ensure that model code calls set_initial_weight once and only once before creating entities in the case.

[back to topic contents]

Limitations

Weighted tabulation works for table statistics based on counts and sums. It does not work yet for ordinal statistics such as the median or the gini coefficient. Such statistics will be computed ignoring weights, i.e. as though all weights are 1.0. If a table uses an ordinal statistic and weighted_tabulation is on, the OpenM++ compiler will issue a warning. For example, the table

table Person DurationOfLife //EN Duration of Life
{
    {
        value_in(alive),                //EN Population size
        min_value_out(duration()),      //EN Minimum duration of life decimals=4
        max_value_out(duration()),      //EN Maximum duration of life decimals=4
        duration() / value_in(alive),   //EN Life expectancy decimals=4
        P50(value_out(duration()))      //EN Median duration of life decimals=4
    }    //EN Demographic characteristics
};

would emit an error like

error : weighting is not supported for statistic 'P50' in table 'DurationOfLife', consider using untransformed ...

[back to topic contents]

Modgen issues

case-based models (Modgen)

Modgen implements similar case weighting functionality and weight-based population scaling to OpenM++ using a function named SetCaseWeight. X-compatible models can call SetCaseWeight instead of set_initial_weight as in the commented statement in the previous example. The OpenM++ framework supplies versions of SetCaseWeight which call set_initial_weight internally.

OpenM++ functions intrinsically at the sub-sample/replicate/member level, so the notion of a distinct total weight and sub-sample weight does not apply in OpenM++.

time-based models (Modgen)

Modgen does not implement population scaling for time-based models. To work around this limitation, model developers have called the Modgen function Set_actor_weight in actor Start functions to scale results to represent a larger population. Consider a time-based model which includes two exogenous parameters, StartingPopulationRealSize for the size of the true real-world population which is represented by the model, and StartingPopulationSize for the size (number of entities) of the synthetic starting population in the model. The Modgen approach might look like this:

void Person::Start()
{
    // Initialize all attributes (OpenM++).
    initialize_attributes();

    // The following function calls implement population scaling for Modgen,
    // using identical weights for each Person entity in the simulation.
    // These calls do nothing in OpenM++.
    // OpenM++ can implement population scaling directly for time-based models.
    
    double dWeight = (double) StartingPopulationRealSize / (double) StartingPopulationSize;
    Set_actor_weight( dWeight );
    Set_actor_subsample_weight( dWeight );
...

The OpenM++ framework includes do-nothing versions of the Modgen functions Set_actor_weight and Set_actor_subsample_weight so this same code will build without error in OpenM++.

To perform the identical population scaling directly in the OpenM++ version of the model (without weights), include the following statement in ompp_framework.ompp:

use "time_based/time_based_scaling_exogenous.ompp";

That use module integrates with the OpenM++ framework to scale table counts and sums by the factor

(double) StartingPopulationRealSize / (double) StartingPopulationSize

using the exogenous parameters StartingPopulationRealSize and StartingPopulationSize.

These two parameters are already declared in the use module time_based_scaling_exogenous.ompp in OpenM++. Declare them in the Modgen version using a Modgen-only source code file name, for example modgen_PopulationSize.mpp, with content

parameters
{
    //EN Simulation population size
    int StartingPopulationSize;

    //EN True population size
    double StartingPopulationRealSize;
};

and then make the values of these two parameters available to both Modgen and OpenM++ by placing them in a file processed by both, for example PopulationSize.dat with contents like

parameters
{
    //EN Simulation population size
    int StartingPopulationSize = 25000;

    //EN True population size
    double StartingPopulationRealSize = 10000000;
};

For more about the visibility of model source code and parameter value files in OpenM++ and Modgen, see Model Code. For more about population scaling in OpenM++, see Population Size and Scaling.

[back to topic contents]

Home

Getting Started

Model development in OpenM++

Using OpenM++

Model Development Topics

Highlight: hook to self-scheduling or trigger attribute
Highlight: The End of Start
Highlight: Enumeration index validity and the index_errors option
Highlight: Simplified iteration of range, classification, partition
Highlight: Parameter, table, and attribute groups can be populated by module declarations
All Models
All options
Authored Model Documentation
Built-in Attributes
Censor Event Time
Create Import Set
Derived Attributes
Derived Tables
Entity Attributes in C++
Entity Function Hooks
Entity Member Packing
Entity Tables
Enumerations
Events
Event Trace
Experienced Modgen Developer
External Names
Floating Point Exceptions
Generated Model Documentation
Groups
Illustrative Model Align1
Lifecycle Attributes
Local Random Streams
Memory Use
Microdata Output
Model Code
Model Documentation
Model Languages
Model Localization
Model Metrics Report
Model Resource Use
Model Symbols
Parameter and Table Display and Content
Population Size and Scaling
Random Stream Generators
Run Memory Prediction
Screened Tables
Symbol Labels and Notes
Tables
Test Models
Time-like and Event-like Attributes
Use Modules
Weighted Tabulation
File-based Parameter Values

OpenM++ web-service: API and cloud setup

Using OpenM++ from Python and R

Docker

OpenM++ Development

OpenM++ Design, Roadmap and Status

OpenM++ web-service API

GET Model Metadata

GET Model Extras

GET Model Run results metadata

GET Model Workset metadata: set of input parameters

Read Parameters, Output Tables or Microdata values

GET Parameters, Output Tables or Microdata values

GET Parameters, Output Tables or Microdata as CSV

GET Modeling Task metadata and task run history

Update Model Profile: set of key-value options

Update Model Workset: set of input parameters

Update Model Runs

Update Modeling Tasks

Run Models: run models and monitor progress

Download model, model run results or input parameters

Upload model runs or worksets (input scenarios)

Download and upload user files

User: manage user settings

Model run jobs and service state

Administrative: manage web-service state

Weighted Tabulation

Related topics

Topic contents

Introduction and Background

Syntax and Use

Limitations

Modgen issues

case-based models (Modgen)

time-based models (Modgen)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!