-
Notifications
You must be signed in to change notification settings - Fork 0
Weighted Tabulation
Home > Model Development Topics > Weighted Tabulation
For case-based models, the weighted_tabulation
option creates, for each entity, the built-in attribute entity_weight
which scales the entity's contribution to tables.
- Introduction and Background
- Syntax and Use How to activate and use
- Limitations Limitations of the current implementation
- Modgen-specific Modgen issues Weighted tabulation in a x-compatible model
Some case-based microsimulation models use micro-data directly from a survey, census, or administrative source. Micro-data from such sources often has a weight associated with each observation which reflects the sampling design and maybe post-stratification or under-count adjustment. Case weights can also be useful in microsimulation models which are not based on micro-data. Such models instead generate cases synthetically from multivariate distributions. They may deliberately over-sample portions of the synthetic population of particular interest, and then adjust for that oversampling by assigning a case weight equal to the reciprocal of the oversampling factor.
OpenM++ contains optional functionality to associate a weight with each entity. That weight scales the contribution of the entity to table counts and sums. The functionality facilitates assigning the same case weight to all of the entities in a case for table coherence. This is important for models which have multiple entities in each case, e.g. ancillary family members of a core entity which may be created later in the simulation of the case. The design integrates with population scaling by computing and using the sum of case weights.
A time-based microsimulation model simulates interacting entities. It is unclear how one might validly represent an interaction of entities which have non-equal weights. Instead, for time-based models based on weighted micro-data, a micro-data record is typically cloned or sampled based on its weight to produce a starting population of entities whose weights are all equal. Such an equal-weighted population can represent a real-world population of a different size by using population scaling, rather than by assigning a weight to each entity with all weights being equal. The end result is the same, but population scaling is more efficient for memory and computation compared to identical entity weights. Also, it is not clear how to implement population scaling in a time-based model with entity weights if the model contains entities of different types, e.g. a single special Ticker
entity, or multiple Dwelling
, Person
, and Family
entities, or a fixed number of Region
entities. For these reasons, entity weights are forbidden in time-based models in OpenM++. Use population scaling to make a time-based model represent a real population of a different size. See Population Size and Scaling for more information.
By default, entities are unweighted. To activate entity weights, include the statement
options weighted_tabulation = on;
in the source code of a case-based model. A natural place to insert this statement is the module ompp_framework.ompp
.
If weighting is turned on in a time-based model, an error message like the following is emitted:
error : weighted tabulation is not allowed with a time-based model, use population scaling instead.
There may be situations where it makes sense for a time-based model to have entity weights. For example, a model with a non-interacting population of weighted entities might be time-based to implement dynamic calibration to exogenous control totals. To override the default behaviour and allow weighted tabulation for a time-based model, add the following statement (as well as the above statement):
options weighted_tabulation_allow_time_based = on;
When weighting is turned on, each entity has a new built-in attribute named entity_weight
, of type double
. Usually model code does not assign a value directly to entity_weight
. Instead, before entities are created for a case, model code sets the initial value of entity_weight
for all entities in the case by calling the function set_initial_weight
, as in the following contrived example:
void CaseSimulation(case_info &ci)
{
extern void SimulateEvents(); // defined in a simulation framework module
// Provide the weight used to initialize the entity_weight attribute for new entities
set_initial_weight(2.0);
// For Modgen-compatible models, use the following instead
//SetCaseWeight(2.0);
// Initialize the person entity
auto prPerson = new Person();
prPerson->Start();
// Simulate events until there are no more.
SimulateEvents();
}
Calling set_initial_weight
before creating any entities in the case ensures that the built-in attribute entity_weight
will have that same value for all entities in the case. The call to set_initial_weight
also enables the calculation of the sum of case weights. That sum of weights is used to correctly scale the population to a specified size if the model uses both weights and population scaling. For that to work correctly, set_initial_weight
must be called once and only once in the logic of the case, before any entities in the case are created.
If weighted tabulation is not enabled, entities have no attribute named entity_weight
, and calls to set_initial_weight
have no effect (but are benign).
If weighted tabulation is enabled, but set_initial_weight
is not called before creating entities in the case, the entity_weight
attribute will be 1.0. However, the total sum of weights used for population scaling will be incorrect because the calculation depends internally on the call to set_initial_weight
. Ensure that model code calls set_initial_weight
once and only once before creating entities in the case.
Weighted tabulation works for table statistics based on counts and sums. It does not work yet for ordinal statistics such as the median or the gini coefficient. Such statistics will be computed ignoring weights, i.e. as though all weights are 1.0. If a table uses an ordinal statistic and weighted_tabulation
is on, the OpenM++ compiler will issue a warning. For example, the table
table Person DurationOfLife //EN Duration of Life
{
{
value_in(alive), //EN Population size
min_value_out(duration()), //EN Minimum duration of life decimals=4
max_value_out(duration()), //EN Maximum duration of life decimals=4
duration() / value_in(alive), //EN Life expectancy decimals=4
P50(value_out(duration())) //EN Median duration of life decimals=4
} //EN Demographic characteristics
};
would emit an error like
error : weighting is not supported for statistic 'P50' in table 'DurationOfLife', consider using untransformed ...
Modgen implements similar case weighting functionality and weight-based population scaling to OpenM++ using a function named SetCaseWeight
. X-compatible models can call SetCaseWeight
instead of set_initial_weight
as in the commented statement in the previous example. The OpenM++ framework supplies versions of SetCaseWeight
which call set_initial_weight
internally.
OpenM++ functions intrinsically at the sub-sample/replicate/member level, so the notion of a distinct total weight and sub-sample weight does not apply in OpenM++.
Modgen does not implement population scaling for time-based models. To work around this limitation, model developers have called the Modgen function Set_actor_weight
in actor Start
functions to scale results to represent a larger population. Consider a time-based model which includes two exogenous parameters, StartingPopulationRealSize
for the size of the true real-world population which is represented by the model, and StartingPopulationSize
for the size (number of entities) of the synthetic starting population in the model. The Modgen approach might look like this:
void Person::Start()
{
// Initialize all attributes (OpenM++).
initialize_attributes();
// The following function calls implement population scaling for Modgen,
// using identical weights for each Person entity in the simulation.
// These calls do nothing in OpenM++.
// OpenM++ can implement population scaling directly for time-based models.
double dWeight = (double) StartingPopulationRealSize / (double) StartingPopulationSize;
Set_actor_weight( dWeight );
Set_actor_subsample_weight( dWeight );
...
The OpenM++ framework includes do-nothing versions of the Modgen functions Set_actor_weight
and Set_actor_subsample_weight
so this same code will build without error in OpenM++.
To perform the identical population scaling directly in the OpenM++ version of the model (without weights), include the following statement in ompp_framework.ompp
:
use "time_based/time_based_scaling_exogenous.ompp";
That use
module integrates with the OpenM++ framework to scale table counts and sums by the factor
(double) StartingPopulationRealSize / (double) StartingPopulationSize
using the exogenous parameters StartingPopulationRealSize
and StartingPopulationSize
.
These two parameters are already declared in the use
module time_based_scaling_exogenous.ompp
in OpenM++. Declare them in the Modgen version using a Modgen-only source code file name, for example modgen_PopulationSize.mpp
, with content
parameters
{
//EN Simulation population size
int StartingPopulationSize;
//EN True population size
double StartingPopulationRealSize;
};
and then make the values of these two parameters available to both Modgen and OpenM++ by placing them in a file processed by both, for example PopulationSize.dat
with contents like
parameters
{
//EN Simulation population size
int StartingPopulationSize = 25000;
//EN True population size
double StartingPopulationRealSize = 10000000;
};
For more about the visibility of model source code and parameter value files in OpenM++ and Modgen, see Model Code. For more about population scaling in OpenM++, see Population Size and Scaling.
- Windows: Quick Start for Model Users
- Windows: Quick Start for Model Developers
- Linux: Quick Start for Model Users
- Linux: Quick Start for Model Developers
- MacOS: Quick Start for Model Users
- MacOS: Quick Start for Model Developers
- Model Run: How to Run the Model
- MIT License, Copyright and Contribution
- Model Code: Programming a model
- Windows: Create and Debug Models
- Linux: Create and Debug Models
- MacOS: Create and Debug Models
- MacOS: Create and Debug Models using Xcode
- Modgen: Convert case-based model to openM++
- Modgen: Convert time-based model to openM++
- Modgen: Convert Modgen models and usage of C++ in openM++ code
- Model Localization: Translation of model messages
- How To: Set Model Parameters and Get Results
- Model Run: How model finds input parameters
- Model Output Expressions
- Model Run Options and ini-file
- OpenM++ Compiler (omc) Run Options
- OpenM++ ini-file format
- UI: How to start user interface
- UI: openM++ user interface
- UI: Create new or edit scenario
- UI: Upload input scenario or parameters
- UI: Run the Model
- UI: Use ini-files or CSV parameter files
- UI: Compare model run results
- UI: Aggregate and Compare Microdata
- UI: Filter run results by value
- UI: Disk space usage and cleanup
- UI Localization: Translation of openM++
-
Highlight: hook to self-scheduling or trigger attribute
-
Highlight: The End of Start
-
Highlight: Enumeration index validity and the
index_errors
option -
Highlight: Simplified iteration of range, classification, partition
-
Highlight: Parameter, table, and attribute groups can be populated by module declarations
- Oms: openM++ web-service
- Oms: openM++ web-service API
- Oms: How to prepare model input parameters
- Oms: Cloud and model runs queue
- Use R to save output table into CSV file
- Use R to save output table into Excel
- Run model from R: simple loop in cloud
- Run RiskPaths model from R: advanced run in cloud
- Run RiskPaths model in cloud from local PC
- Run model from R and save results in CSV file
- Run model from R: simple loop over model parameter
- Run RiskPaths model from R: advanced parameters scaling
- Run model from Python: simple loop over model parameter
- Run RiskPaths model from Python: advanced parameters scaling
- Windows: Use Docker to get latest version of OpenM++
- Linux: Use Docker to get latest version of OpenM++
- RedHat 8: Use Docker to get latest version of OpenM++
- Quick Start for OpenM++ Developers
- Setup Development Environment
- 2018, June: OpenM++ HPC cluster: Test Lab
- Development Notes: Defines, UTF-8, Databases, etc.
- 2012, December: OpenM++ Design
- 2012, December: OpenM++ Model Architecture, December 2012
- 2012, December: Roadmap, Phase 1
- 2013, May: Prototype version
- 2013, September: Alpha version
- 2014, March: Project Status, Phase 1 completed
- 2016, December: Task List
- 2017, January: Design Notes. Subsample As Parameter problem. Completed
GET Model Metadata
- GET model list
- GET model list including text (description and notes)
- GET model definition metadata
- GET model metadata including text (description and notes)
- GET model metadata including text in all languages
GET Model Extras
GET Model Run results metadata
- GET list of model runs
- GET list of model runs including text (description and notes)
- GET status of model run
- GET status of model run list
- GET status of first model run
- GET status of last model run
- GET status of last completed model run
- GET model run metadata and status
- GET model run including text (description and notes)
- GET model run including text in all languages
GET Model Workset metadata: set of input parameters
- GET list of model worksets
- GET list of model worksets including text (description and notes)
- GET workset status
- GET model default workset status
- GET workset including text (description and notes)
- GET workset including text in all languages
Read Parameters, Output Tables or Microdata values
- Read parameter values from workset
- Read parameter values from workset (enum id's)
- Read parameter values from model run
- Read parameter values from model run (enum id's)
- Read output table values from model run
- Read output table values from model run (enum id's)
- Read output table calculated values from model run
- Read output table calculated values from model run (enum id's)
- Read output table values and compare model runs
- Read output table values and compare model runs (enun id's)
- Read microdata values from model run
- Read microdata values from model run (enum id's)
- Read aggregated microdata from model run
- Read aggregated microdata from model run (enum id's)
- Read microdata run comparison
- Read microdata run comparison (enum id's)
GET Parameters, Output Tables or Microdata values
- GET parameter values from workset
- GET parameter values from model run
- GET output table expression(s) from model run
- GET output table calculated expression(s) from model run
- GET output table values and compare model runs
- GET output table accumulator(s) from model run
- GET output table all accumulators from model run
- GET microdata values from model run
- GET aggregated microdata from model run
- GET microdata run comparison
GET Parameters, Output Tables or Microdata as CSV
- GET csv parameter values from workset
- GET csv parameter values from workset (enum id's)
- GET csv parameter values from model run
- GET csv parameter values from model run (enum id's)
- GET csv output table expressions from model run
- GET csv output table expressions from model run (enum id's)
- GET csv output table accumulators from model run
- GET csv output table accumulators from model run (enum id's)
- GET csv output table all accumulators from model run
- GET csv output table all accumulators from model run (enum id's)
- GET csv calculated table expressions from model run
- GET csv calculated table expressions from model run (enum id's)
- GET csv model runs comparison table expressions
- GET csv model runs comparison table expressions (enum id's)
- GET csv microdata values from model run
- GET csv microdata values from model run (enum id's)
- GET csv aggregated microdata from model run
- GET csv aggregated microdata from model run (enum id's)
- GET csv microdata run comparison
- GET csv microdata run comparison (enum id's)
GET Modeling Task metadata and task run history
- GET list of modeling tasks
- GET list of modeling tasks including text (description and notes)
- GET modeling task input worksets
- GET modeling task run history
- GET status of modeling task run
- GET status of modeling task run list
- GET status of modeling task first run
- GET status of modeling task last run
- GET status of modeling task last completed run
- GET modeling task including text (description and notes)
- GET modeling task text in all languages
Update Model Profile: set of key-value options
- PATCH create or replace profile
- DELETE profile
- POST create or replace profile option
- DELETE profile option
Update Model Workset: set of input parameters
- POST update workset read-only status
- PUT create new workset
- PUT create or replace workset
- PATCH create or merge workset
- DELETE workset
- POST delete multiple worksets
- DELETE parameter from workset
- PATCH update workset parameter values
- PATCH update workset parameter values (enum id's)
- PATCH update workset parameter(s) value notes
- PUT copy parameter from model run into workset
- PATCH merge parameter from model run into workset
- PUT copy parameter from workset to another
- PATCH merge parameter from workset to another
Update Model Runs
- PATCH update model run text (description and notes)
- DELETE model run
- POST delete model runs
- PATCH update run parameter(s) value notes
Update Modeling Tasks
Run Models: run models and monitor progress
Download model, model run results or input parameters
- GET download log file
- GET model download log files
- GET all download log files
- GET download files tree
- POST initiate entire model download
- POST initiate model run download
- POST initiate model workset download
- DELETE download files
- DELETE all download files
Upload model runs or worksets (input scenarios)
- GET upload log file
- GET all upload log files for the model
- GET all upload log files
- GET upload files tree
- POST initiate model run upload
- POST initiate workset upload
- DELETE upload files
- DELETE all upload files
Download and upload user files
- GET user files tree
- POST upload to user files
- PUT create user files folder
- DELETE file or folder from user files
- DELETE all user files
User: manage user settings
Model run jobs and service state
- GET service configuration
- GET job service state
- GET disk usage state
- POST refresh disk space usage info
- GET state of active model run job
- GET state of model run job from queue
- GET state of model run job from history
- PUT model run job into other queue position
- DELETE state of model run job from history
Administrative: manage web-service state
- POST a request to refresh models catalog
- POST a request to close models catalog
- POST a request to close model database
- POST a request to delete the model
- POST a request to open database file
- POST a request to cleanup database file
- GET the list of database cleanup log(s)
- GET database cleanup log file(s)
- POST a request to pause model run queue
- POST a request to pause all model runs queue
- PUT a request to shutdown web-service