-
Notifications
You must be signed in to change notification settings - Fork 0
2012 December Design Model Architecture
This roadmap and architecture document presented from "model developer" point of view, which imply C++ development process, user aspects of OpenM++ are deliberately excluded. Please refer to OpenM++ user guide pages for additional details.
OpenM++ by design is portable and scalable environment which allow researchers to run same model on single Windows PC and on Linux (or Windows) HPC cluster by simply re-compiling model C++ code for target platform. For example, model developer can use Visual Studio on his own Windows PC to write, test and debug the model and later send model .cpp code to other researcher who can build and run that model on Linux HPC cluster with hundreds CPUs.
There are four main groups of openM++ model users:
- developer: using C++ IDE with openM++ installed to develop and run models mostly on their local PC
- researcher: uses openM++ models created by developer executable to run simulation on local workstation and/or on HPC cluster
- institutional user: member of research organization with advanced IT infrastructure who mostly running openM++ models in resource-shared environment (i.e. over the web)
- public user: member of the general public using simplified interface over the web.
Those user groups do have distinctive hardware / software environments and different requirements to model architecture:
- developer:
- mostly local Windows or Linux PC with GUI
- run the model hundred times to debug it
- have full admin privileges on his local machine
- eventually need to pack model executable and data files and send it to researcher
- researcher:
- HPC cluster (large or small) or local Windows, Linux without GUI
- run the model multiple times and collect the results
- run the model 100's or 1000's of times for Probabilitistic Sensitivity Analysis or for model estimation.
- do not have admin privileges, especially on cluster
- often need to pack model data files to publish it, move from local PC to HPC cluster or share with other researchers
- institutional user:
- uses web UI to run the model in cloud, on HPC cluster or other powerful server environment
- have absolutely no access to actual server environment
- at any time can use IT department to deploy openM++ models in cloud, create modeling web-sites, manage model database on SQL server, etc.
- public user:
- runs a version of a model via the web written and compiled in openM++ with a limited set of parameters and limited set of output screens, possibly in parallel with hundreds of other general public users.
- very limited if any capacity at all to save results between sessions.
It is typical for openM++ users to not have advanced IT management skill as they are highly regarded professionals in their own area of interest. It may also not always possible for openM++ user to install additional software in their environment (i.e. in public HPC cluster). From that point easiest way of model deployment and model data export-import can be done through simple file operations (file copy). It is obviously not suitable for institutional users, however they can: (a) rely on dedicated IT department resources if necessary and (b) do have installed and supported web-servers, SQL databases servers and other resources where openM++ cloud components can be deployed.
Based on those use cases openM++ model architecture assumes following preferences:
- model, input parameters and output results available as set of files
- user may not want to (or can’t install) database client-server software to store model data
Note: To simplify description of model architecture below it is done from developer or researcher user point of view and web cloud aspects are deliberately excluded.
Because openM++ models can scale from single PC to HPC cluster model execution (model run-cycle) depends on environment.
Simple (single PC) case (italic indicates optional):
- start of model executable (model.exe)
- read model settings from database (read execution scenario)
- read model input data from database
- run modeling loop:
- execute user model code
- report on model progress if required
- do model results aggregation if required
- write results into database output tables
- finally report execution statistics and exit
If model runs in cluster environment then openM++ can transparently create multiple copies of model executable process and distribute it on cluster nodes.
Model run-cycle on cluster (italic indicates optional):
- start of master model executable (model.exe)
- read model settings from database (read execution scenario)
- detect run-time environment
- spawn model.exe processes on computational nodes
- read model input data from database
- distribute input data between all computational nodes
- run modeling loop:
- execute user model code
- report on model progress if required
- collect model tracking information to debug the model
- wait until all modeling completed on all computational nodes
- collect model results from each node
- do results aggregation if required
- write results into database output tables
- finally report execution statistics and exit
Note: It is important to understand the diagram on that page represent schematic picture and real openM++ code may be significantly more complex. For example, report modeling progress call exchangeProgress() may not actually do anything but place a data in the buffer and separate thread would do actual master-slave communication and progress report.
The modeling library provides core functionality for the model run-cycle as it is described above. It contains main() entry point, it does agent creation / destruction, event queue management, on-the-fly cross-tabulation, and pre- and post-simulation processing.
It uses OpenM++ data and execute libraries to organize model execution (especially in cluster environment), read model input parameters, save model tracks and aggregate cross-tabulation results:
- for each input parameter model library by known data type, shape and other necessary information (memory address if required) to instantiate class object and populate it with values by calling data library
- for each output table result model library call data library to save results in model data storage (model database)
OpenM++ data storage should provide an ability to store model parameters and output results. It consist of model data storage (model database), data library and, optionally, can use execute library to organize communication between computational nodes.
It can be implemented in following ways:
- option 0. flat files: directly read-write into flat text (XML, CSV, etc.) files
- option a. flat files + buffering (or MPI-IO): use memory buffering (or MPI-IO) to organize large size chunks and reduce data exchange in cluster environment
- option b. client-server database: use MySQL or other open source SQL server database
- option c. file-based (embedded) SQL database: use file-based database (i.e. SQLite) inside of master process and write custom code to emulate client-server for computational nodes
Evaluating those options from point of view openM++ use cases described above:
Option 0: direct write to flat files may not be realistic approach in cluster environment because:
- computational nodes most likely don’t have locale file system
- global shared file system may have very high or prohibitive cost for small write operations. For example, if 100 model executables from 100 computational nodes want write to 100 bytes it may be, in worst case, 100 times slower than if master node writes 100*100 bytes. Of course, MPI-IO can solve that problem.
Option a: flat files + buffering (or MPI-IO)
- pros:
- most human readable format
- no additional tools required to create or modify model data, it can be done by any text editor
- minimal development efforts
- cons:
- real model data input typically bigger than user can type-in and maintain without additional tools
- to analyze the data in any other software (i.e. Excel, R, SAS) custom data converter(s) must be developed
Option b: client-server
- pros:
- relatively easy to implement
- good performance is almost guaranteed
- hundreds tools to read, compare and manipulate the data
- cons:
- require to install and administer SQL server database which many openM++ users, such as model developers and independent researchers may have no right to do or may not want to do
Option c: file-based database (i.e. SQLite)
- pros:
- hundreds tools to read and manipulate the data (i.e. Firefox SQLite manager add-on)
- relatively easy to transfer to any database or exchange the data between researchers
- cons:
- development time to create client-server code for cluster environment much higher than any other options
- it is less convenient as flat text files
OpenM++ data storage roadmap:
OpenM++ data storage can be implemented in following order of priorities:
- (pri1) inside of single embedded (file-based) SQL database
- (pri2) as above plus extra database for model tracking
- (pri3) model parameters and metadata inside of file-based SQL database and output results as .csv files
- (pri3) inside of SQL server database chosen by model developer (i.e. MSSQL, Oracle, etc.)
Data library(s) is a C++ library to support model data read/write operations and hide low-level implementation details to simplify model code and modeling library. It is important to understand there is no "one size fit all solution" and openM++ must provide multiple versions of data library for different target model storage. For example, for model developer SQLite data library may be most convenient, however when openM++ installed as part of web solution then MySQL data library suites more.
Following priority order of data libraries implementation planned:
- (pri1) SQLite as embedded (file-based) database
- (pri2) generic ODBC tested with MySQL (MariaDB), PostgreSQL, MS SQL, Oracle and IBM DB2
- (pri3) flat text files version of data library (using MPI-IO)
- (pri3) MySQL (MariaDB) native client (non-ODBC)
- (pri3) PostgreSQL native client (non-ODBC)
List above is not final and can be changed anytime. Many other options also considered for development of specialized data library version. For example, libmysqld, Firebird, MS Access reviewed as potential candidates for embedded (file-based) database. Also MPI-IO, HDF5, NetCDF considered as foundation for flat text files data library version. And in the future releases it is very much possible to have native client (not ODBC-based) version of data library for MS SQL, Oracle and IBM DB2.
Keep in mind data library is part of the model run-time and not be ideal choice for other purpose. Most easy way to integrate openM++ with existing products is to use SQL loaders or output convertors. It allows to import or export data from openM++ data storage into other well-known SQL servers, i.e. from SQLite into MS SQL or dump it into flat text files (i.e. CSV, XML).
- Windows: Quick Start for Model Users
- Windows: Quick Start for Model Developers
- Linux: Quick Start for Model Users
- Linux: Quick Start for Model Developers
- MacOS: Quick Start for Model Users
- MacOS: Quick Start for Model Developers
- Model Run: How to Run the Model
- MIT License, Copyright and Contribution
- Model Code: Programming a model
- Windows: Create and Debug Models
- Linux: Create and Debug Models
- MacOS: Create and Debug Models
- MacOS: Create and Debug Models using Xcode
- Modgen: Convert case-based model to openM++
- Modgen: Convert time-based model to openM++
- Modgen: Convert Modgen models and usage of C++ in openM++ code
- Model Localization: Translation of model messages
- How To: Set Model Parameters and Get Results
- Model Run: How model finds input parameters
- Model Output Expressions
- Model Run Options and ini-file
- OpenM++ Compiler (omc) Run Options
- OpenM++ ini-file format
- UI: How to start user interface
- UI: openM++ user interface
- UI: Create new or edit scenario
- UI: Upload input scenario or parameters
- UI: Run the Model
- UI: Use ini-files or CSV parameter files
- UI: Compare model run results
- UI: Aggregate and Compare Microdata
- UI: Filter run results by value
- UI: Disk space usage and cleanup
- UI Localization: Translation of openM++
-
Highlight: hook to self-scheduling or trigger attribute
-
Highlight: The End of Start
-
Highlight: Enumeration index validity and the
index_errors
option -
Highlight: Simplified iteration of range, classification, partition
-
Highlight: Parameter, table, and attribute groups can be populated by module declarations
- Oms: openM++ web-service
- Oms: openM++ web-service API
- Oms: How to prepare model input parameters
- Oms: Cloud and model runs queue
- Use R to save output table into CSV file
- Use R to save output table into Excel
- Run model from R: simple loop in cloud
- Run RiskPaths model from R: advanced run in cloud
- Run RiskPaths model in cloud from local PC
- Run model from R and save results in CSV file
- Run model from R: simple loop over model parameter
- Run RiskPaths model from R: advanced parameters scaling
- Run model from Python: simple loop over model parameter
- Run RiskPaths model from Python: advanced parameters scaling
- Windows: Use Docker to get latest version of OpenM++
- Linux: Use Docker to get latest version of OpenM++
- RedHat 8: Use Docker to get latest version of OpenM++
- Quick Start for OpenM++ Developers
- Setup Development Environment
- 2018, June: OpenM++ HPC cluster: Test Lab
- Development Notes: Defines, UTF-8, Databases, etc.
- 2012, December: OpenM++ Design
- 2012, December: OpenM++ Model Architecture, December 2012
- 2012, December: Roadmap, Phase 1
- 2013, May: Prototype version
- 2013, September: Alpha version
- 2014, March: Project Status, Phase 1 completed
- 2016, December: Task List
- 2017, January: Design Notes. Subsample As Parameter problem. Completed
GET Model Metadata
- GET model list
- GET model list including text (description and notes)
- GET model definition metadata
- GET model metadata including text (description and notes)
- GET model metadata including text in all languages
GET Model Extras
GET Model Run results metadata
- GET list of model runs
- GET list of model runs including text (description and notes)
- GET status of model run
- GET status of model run list
- GET status of first model run
- GET status of last model run
- GET status of last completed model run
- GET model run metadata and status
- GET model run including text (description and notes)
- GET model run including text in all languages
GET Model Workset metadata: set of input parameters
- GET list of model worksets
- GET list of model worksets including text (description and notes)
- GET workset status
- GET model default workset status
- GET workset including text (description and notes)
- GET workset including text in all languages
Read Parameters, Output Tables or Microdata values
- Read parameter values from workset
- Read parameter values from workset (enum id's)
- Read parameter values from model run
- Read parameter values from model run (enum id's)
- Read output table values from model run
- Read output table values from model run (enum id's)
- Read output table calculated values from model run
- Read output table calculated values from model run (enum id's)
- Read output table values and compare model runs
- Read output table values and compare model runs (enun id's)
- Read microdata values from model run
- Read microdata values from model run (enum id's)
- Read aggregated microdata from model run
- Read aggregated microdata from model run (enum id's)
- Read microdata run comparison
- Read microdata run comparison (enum id's)
GET Parameters, Output Tables or Microdata values
- GET parameter values from workset
- GET parameter values from model run
- GET output table expression(s) from model run
- GET output table calculated expression(s) from model run
- GET output table values and compare model runs
- GET output table accumulator(s) from model run
- GET output table all accumulators from model run
- GET microdata values from model run
- GET aggregated microdata from model run
- GET microdata run comparison
GET Parameters, Output Tables or Microdata as CSV
- GET csv parameter values from workset
- GET csv parameter values from workset (enum id's)
- GET csv parameter values from model run
- GET csv parameter values from model run (enum id's)
- GET csv output table expressions from model run
- GET csv output table expressions from model run (enum id's)
- GET csv output table accumulators from model run
- GET csv output table accumulators from model run (enum id's)
- GET csv output table all accumulators from model run
- GET csv output table all accumulators from model run (enum id's)
- GET csv calculated table expressions from model run
- GET csv calculated table expressions from model run (enum id's)
- GET csv model runs comparison table expressions
- GET csv model runs comparison table expressions (enum id's)
- GET csv microdata values from model run
- GET csv microdata values from model run (enum id's)
- GET csv aggregated microdata from model run
- GET csv aggregated microdata from model run (enum id's)
- GET csv microdata run comparison
- GET csv microdata run comparison (enum id's)
GET Modeling Task metadata and task run history
- GET list of modeling tasks
- GET list of modeling tasks including text (description and notes)
- GET modeling task input worksets
- GET modeling task run history
- GET status of modeling task run
- GET status of modeling task run list
- GET status of modeling task first run
- GET status of modeling task last run
- GET status of modeling task last completed run
- GET modeling task including text (description and notes)
- GET modeling task text in all languages
Update Model Profile: set of key-value options
- PATCH create or replace profile
- DELETE profile
- POST create or replace profile option
- DELETE profile option
Update Model Workset: set of input parameters
- POST update workset read-only status
- PUT create new workset
- PUT create or replace workset
- PATCH create or merge workset
- DELETE workset
- POST delete multiple worksets
- DELETE parameter from workset
- PATCH update workset parameter values
- PATCH update workset parameter values (enum id's)
- PATCH update workset parameter(s) value notes
- PUT copy parameter from model run into workset
- PATCH merge parameter from model run into workset
- PUT copy parameter from workset to another
- PATCH merge parameter from workset to another
Update Model Runs
- PATCH update model run text (description and notes)
- DELETE model run
- POST delete model runs
- PATCH update run parameter(s) value notes
Update Modeling Tasks
Run Models: run models and monitor progress
Download model, model run results or input parameters
- GET download log file
- GET model download log files
- GET all download log files
- GET download files tree
- POST initiate entire model download
- POST initiate model run download
- POST initiate model workset download
- DELETE download files
- DELETE all download files
Upload model runs or worksets (input scenarios)
- GET upload log file
- GET all upload log files for the model
- GET all upload log files
- GET upload files tree
- POST initiate model run upload
- POST initiate workset upload
- DELETE upload files
- DELETE all upload files
Download and upload user files
- GET user files tree
- POST upload to user files
- PUT create user files folder
- DELETE file or folder from user files
- DELETE all user files
User: manage user settings
Model run jobs and service state
- GET service configuration
- GET job service state
- GET disk usage state
- POST refresh disk space usage info
- GET state of active model run job
- GET state of model run job from queue
- GET state of model run job from history
- PUT model run job into other queue position
- DELETE state of model run job from history
Administrative: manage web-service state
- POST a request to refresh models catalog
- POST a request to close models catalog
- POST a request to close model database
- POST a request to delete the model
- POST a request to open database file
- POST a request to cleanup database file
- GET the list of database cleanup log(s)
- GET database cleanup log file(s)
- POST a request to pause model run queue
- POST a request to pause all model runs queue
- PUT a request to shutdown web-service