Skip to content

Documentation

Shane A. McQuarrie edited this page Feb 8, 2021 · 7 revisions

This page details the process of creating and analyzing reduced-order models (ROMs) for the GEMS combustion data. See Problem Statement for an overview of the setting and the data; see Installation and Setup for initial instructions on downloading the code and the data.

In the code examples, $ indicates the command line and >>> indicates Python. The code itself is internally documented and can be accessed on the fly with dynamic object introspection, e.g.,

>>> import utils
>>> help(utils.load_gems_data)

Contents

1. Unpack

The script step1_unpack.py reads the GEMS output directly from the .tar archives downloaded from Globus, gathers the data into a single data set, and saves it in HDF5 format. The process runs in parallel and takes several minutes. After the process completes successfully, the .tar archives from Globus may be deleted.

Usage

python3 step1_unpack.py --help
python3 step1_unpack.py DATAFOLDER [--overwrite] [--serial]

positional arguments:
  DATAFOLDER   folder containing the raw GEMS .tar data files

optional arguments:
  -h, --help   show this help message and exit
  --overwrite  overwrite the existing HDF5 data file

Examples

# Process the raw .tar data files that are placed in /storage/combustion/.
$ python3 step1_unpack.py /storage/combustion

# Process the raw .tar data files that are placed in the current directory, overwriting the resulting HDF5 file if it already exists.
$ python3 step1_unpack.py . --overwrite

# Process the raw .tar data files in /storage/combustion/ serially (not in parallel).
$ python3 step1_unpack.py /storage/combustion --serial

Loading Results: utils.load_gems_data().

>>> import utils
>>> gems_data, t = utils.load_gems_data()

Each column of gems_data is a single snapshot, i.e., gems_data[:,j] is the full GEMS solution for all 8 native variables at time t[j]. The first DOF = 38523 rows of data represent the first variable, and so on. The variables are, in order,

  1. Pressure [Pa]
  2. x-velocity [m/s]
  3. y-velocity [m/s]
  4. Temperature [K]
  5. CH4 (methane) Mass Fraction
  6. O2 (oxygen) Mass Fraction
  7. H2O (water) Mass Fraction
  8. CO2 (carbon dioxide) Mass Fraction

See Problem Statement for more details.

2. Preprocess

The GEMS snapshot data must be preprocessed to be suitable for Operator Inference. The script step2_preprocess.py generates training data for reduced-order model learning in three steps:

  1. Transform the GEMS variables to the learning variables, then scale each learning variable to the interval [-1,1].
  2. Compute the POD basis (the dominant left singular vectors) of the lifted, scaled snapshot training data and save the basis and the corresponding singular values.
  3. Project the lifted, scaled snapshot training data to the low-dimensional subspace defined by the POD basis, compute time derivative information for the projected snapshots, and save the projected data.

These three steps can also performed separately by step2a_transform.py, step2b_basis.py, and step2c_project.py, respectively.

Usage

python3 step2_preprocess.py --help
python3 step2_preprocess.py TRAINSIZE MODES

positional arguments:
  TRAINSIZE    number of snapshots in the training data
  MODES        number of POD modes for projecting data

Examples

# Get training data from 10,000 snapshots and with a maximum of 50 POD modes.
$ python3 step2_preprocess.py 10000 50

# Equivalently, do the three steps separately.
$ python3 step2a_transform.py 10000     # Transform (lift and scale) 10,000 GEMS snapshots.
$ python3 step2b_basis.py 10000 50      # Compute a rank-50 POD basis from the transformed snapshots.
$ python3 step2c_project.py 10000       # Project the transformed snapshots and estimate time derivatives.

# Get training data from 15,000 snapshots and with a maximum of 100 POD modes.
$ python3 step2_preprocess.py 15000 100

Loading Results:

>>> import utils
>>> trainsize = 10000       # Number of snapshots used as training data.
>>> num_modes = 44          # Number of POD modes.
>>> Q, t, scales = utils.load_scaled_data(trainsize)
>>> V, scales = utils.load_basis(trainsize, num_modes)
>>> Q_, Qdot_, t = utils.load_projected_data(trainsize, num_modes)

Here,

  • Q[:,j] is a lifted, scaled snapshot corresponding to time t[j];
  • V[:,j] is a the _j_th basis vector, the _j_th left singular vector of Q;
  • Q_[:,j] is a projected snapshot with approximate time derivative Qdot_[:,j], both corresponding to time t[j];
  • scales[i,:] is the shifting / dilation factors for learning variable i used in the scaling (see data_processing.scale() and data_processing.unscale()).

The scales returned by utils.load_scaled_data() and utils.load_basis() are identical, as are the t returned by utils.load_scaled_data() and utils.load_projected_data().

3. Train

The script step3_train.py uses data prepared in step 2 to learn reduced-order models (ROMs) with Tikhonov-regularized Operator Inference with hyperparameter selection. The regularization is determined by the non-negative scalar hyperparameters λ1 and λ2: λ1 is the penalization for non-quadratic terms in the ROM, and λ2 is the penalization for quadratic terms only (see this paper for more details). The learned ROM operators are saved in HDF5 format for later use.

This script has three modes for designating or determining an appropriate regularization hyperparameters λ1 and λ2, indicated with the following command line flags.

  • --single: train and save a ROM for a given choice of λ1 and λ2, passed as REG1 and REG2.
  • --gridsearch: train one ROM for each (λ12) pair in the two-dimensional REG3xREG6 hyperparameter grid [REG1,REG2]x[REG4,REG5]; save the stable ROM with the least training error.
  • --minimize: specify initial guesses for λ1 and λ2 as REG1 and REG2, then use Nelder-Mead search to find a locally optimal hyperparameter pair (λ12).

Usage

python3 step3_train.py --help
python3 step3_train.py --single TRAINSIZE MODES REG1 REG2
python3 step3_train.py --gridsearch TRAINSIZE MODES REG1 ... REG6 [--testsize TESTSIZE] [--margin MARGIN]
python3 step3_train.py --minimize TRAINSIZE MODES REG1 REG2 [--testsize TESTSIZE] [--margin MARGIN]

subcommands:
  --single              train and save a single ROM with regularization hyperparameters REG1 (non-quadratic penalizer) and REG2 (quadratic penalizer)
  --gridsearch          train over the REG3xREG6 grid [REG1,REG2]x[REG4,REG5] of regularization hyperparameter candidates, saving only the stable ROM with the least training error
  --minimize            given initial guesses REG1 (non-quadratic penalizer) and REG2 (quadratic penalizer), use Nelder-Mead search to train and save a ROM that is locally optimal in the regularization hyperparameter space

positional arguments:
  TRAINSIZE             number of snapshots in the training data
  MODES                 number of POD modes used to project the data (dimension of ROM to be learned)
  REG1 REG2 [...REG6]   regularization parameters for ROM training, interpreted differently by --single, --gridsearch, and --minimize

optional arguments:
  -h, --help            show this help message and exit
  --testsize TESTSIZE   number of time steps for which the trained ROM must satisfy the POD bound (remain stable)
  --margin MARGIN       factor by which the POD coefficients of the ROM simulation are allowed to deviate in magnitude from the training data

Examples

## --single: train and save a single ROM for a given λ1, λ2.

# Use 10,000 projected snapshots to learn a ROM of dimension r = 24
# with regularization parameters λ1 = 400, λ2 = 21000.
$ python3 step3_train.py --single 10000 24 400 21000

## --gridsearch: train over a grid of candidates for λ1 and λ2, saving only the stable ROM with least training error.

# Use 20,000 projected snapshots to learn a ROM of dimension r = 40 and save the one with the regularization resulting in the least training error and for which the integrated POD modes stay within 150% of the training data in magnitude for 60,000 time steps. For the regularization parameters, test each point in the 4x5 logarithmically-spaced grid [500,9000]x[8000,10000]
$ python3 step3_train.py --gridsearch 10000 40 5e2 9e3 4 8e3 1e4 5 --testsize 60000 --margin 1.5

## --minimize: given initial guesses for λ1 and λ2, use Nelder-Mead search to train and save a ROM that is locally optimal in the regularization hyperparameter space.

# Use 10,000 projected snapshots to learn a ROM of dimension r = 30 and save the one with the regularization resulting in the least training error and for which the integrated POD modes stay within 150% of the training data in magnitude for 60,000 time steps. For the regularization parameters, search starting from λ1 = 300, λ2 = 7000.
$ python3 step3_train.py --minimize 10000 30 300 7000 --testsize 60000 --margin 1.5

Loading Results: utils.load_rom().

>>> import utils
>>> trainsize = 10000       # Number of snapshots used as training data.
>>> num_modes = 44          # Number of POD modes.
>>> regs = (1e4, 1e5)       # Regularization hyperparameters for Operator Inference.
>>> rom = utils.load_rom(trainsize, num_modes, regs)

Here rom is an object of type rom_operator_inference.InferredContinuousROM. See the rom_operator_inference API for documentation.

4. Plot

The script step4_plot.py loads and simulates ROMs trained in step 3, then plots results in time against the corresponding GEMS data. While predictions at a single point are not representative of accuracy as a whole for this problem, these plots are a good first-step for evaluating a ROM.

There are three available plot types, indicated with the following command line flags.

  • --point-traces: plot learning variables in time at fixed points of the computational domain. See Problem Statement for the default locations.
  • --relative-errors: plot relative projection and prediction errors as a function of time. This routine is memory intensive.
  • --spatial-statistics: spatial averages of pressure, velocities, and temperature, as well as spatial integrals (sums) of species molar concentrations, both as functions of time.

Usage

python3 step4_plot.py --help
python3 step4_plot.py --point-traces TRAINSIZE MODES REG [--location L [...]]
python3 step4_plot.py --relative-errors TRAINSIZE MODES REG
python3 step4_plot.py --spatial-statistics TRAINSIZE MODES REG

subcommands:
  --point-traces        plot point traces in time at the specified monitoring locations
  --relative-errors     plot relative errors in time, averaged over the spatial domain
  --spatial-statistics  plot spatial averages and species integrals

positional arguments:
  TRAINSIZE             number of snapshots in the training data
  MODES                 number of POD modes used to project the data (dimension of the learned ROM)
  REG1                  regularization hyperparameter for non-quadratic ROM terms
  REG2                  regularization hyperparameter for quadratic ROM terms

optional arguments:
  -h, --help            show this help message and exit
  --location L [...]    monitor locations for time trace plots

Examples

## --point-traces: plot results in time at fixed spatial locations.

# Plot time traces of each variable at the monitor locations for the ROM trained from 10,000 snapshots with 22 POD modes and regularization hyperparameters λ1 = 300, λ2 = 21000.
$ python3 step4_plot.py --point-traces 10000 22 300 21000

## --spatial-statistics: plot results in time averaged over the spatial domain.

# Plot spatial averages and species integrals for the ROM trained from 20,000 snapshots with 40 POD modes and regularization hyperparameters λ1 = 9e3, λ2 = 1e4.
$ python3 step4_plot.py --spatial-statistics 20000 40 9e3 1e4

## --relative-errors: plot relative projection and prediction errors in time, averaged over the spatial domain.

# Plot errors for the ROM trained from 20,000 snapshots with 43 POD modes and regularization parameters λ1 = 350, λ2 = 18500.
$ python3 step4_plot.py --errors 20000 43 350 18500

Loading Results: figures are saved as PDFs in the folder specified by config.figures_path().

>>> import config
>>> print("figures are saved to", config.figures_path())

5. Export

The script step5_export.py writes Tecplot-readable ASCII (text) files from simulation data. The resulting files can be used with Tecplot to visualize snapshots over the computational domain.

There are three types of output files, indicated with the following positional command line arguments:

  • gems: write full-order GEMS data in the ROM learning variables.
  • rom: write reconstructed ROM outputs. The specific ROM is selected via command line arguments --trainsize k, --modes r, and --regularization λ1 and λ2.
  • error: write the absolute error between the GEMS data and the ROM outputs.

Usage

python3 step5_export.py -h
python3 step5_export.py (gems | rom | error) --timeindex T [...] --variables V [...] [--trainsize TRAINSIZE] [--modes MODES] [--regularization REG1 REG2]

positional arguments:
  SNAPTYPE              which snapshot types to save (gems, rom, error)

optional arguments:
  -h, --help            show this help message and exit
  --timeindex T [...]   indices of snapshots to save (default every 100th snapshot)
  --variables V [...]   variables to save, a subset of config.ROM_VARIABLES (default all)
  --trainsize TRAINSIZE number of snapshots in the ROM training data
  --modes MODES         ROM dimension (number of retained POD modes)
  --regularization REG1 REG2
                        regularization hyperparameters in the ROM training

Examples

# Export every 100th snapshot (default) of GEMS data (all variables).
$ python3 step5_export.py gems

# Export only snapshot 5000 of GEMS data (all variables).
$ python3 step5_export.py gems --timeindex 5000

# Export only snapshot 4000 of GEMS pressure and temperature data.
$ python3 step5_export.py gems --timeindex 4000 --variables p T

# Export snapshot 4000 of reconstructed pressure, temperature, and methane data from the ROM trained from 10,000 snapshots, 22 POD modes, and regularization hyperparameters 200 and 30000.
$ python3 step5_export.py rom --timeindex 4000 --variables p T CH4 --trainsize 10000 --modes 22 --regularization 2e2 3e4

# Export every 100th snapshot of reconstructed ROM data (all variables) and the absolute errors, derived from the ROM trained from 20,000 snapshots, 44 POD modes, and regularization hyperparameter 100 and 40000.
$ python3 step5_export.py rom error --trainsize 20000 --modes 44 --regularization 1e2 4e4

Loading Results: data files are saved in the folder specified by config.tecplot_path().

>>> import config
>>> print("Tecplot-friendly files are exported to", config.tecplot_path())

The files can be visualized with Tecplot (File >> Load Data, then check the Contours box).

Complete Example

For this walkthrough, we assume the (small) code files exist in a folder ~/Desktop/combustion, the (large) data files exist in a folder /storage/combustion, and the BASE_FOLDER variable in [config.py]((../blob/master/config.py) is set to /storage/combustion.

Suppose we want to create a ROM from 20,000 snapshots with 43 POD modes and create some visualizations to analyze its performance. We don't have appropriate value for the regularization hyperparameters λ1 and λ2 as of yet.

# Navigate to the code directory.
$ cd ~/Desktop/combustion

# Unpack the raw data in the data directory.
$ python3 step1.py /storage/combustion

# Prepare a set of training data with 20,000 snapshots and 50 POD modes.
$ python3 step2_preprocess.py 20000 50      # this suffices as 50 > 43.

# Do a gridsearch over [100,500]x[15000,25000] with 10 logarithmically spaced values for λ1 and 15 logarithmically spaced values for λ2
$ python3 step3_train.py --gridsearch 20000 44 1e2 5e5 10 1.5e4 2.5e4 15

The grid search selects λ1 = 245 and λ2 = 19365, so we do a more targeted hyperparameter search in that vicinity.

# Train a ROM with locally optimal regularization hyperparameters near λ1=245, λ2=19365.
$ python3 step3_train.py --minimize 20000 44 245 19365

The minimization selects λ1 = 322, λ2 = 18199. Now we plot point-wise results and export data for visualization with Tecplot.

# Plot learning variable point traces and spatial statistics against the corresponding GEMS data.
$ python3 step4_plot.py --point-traces 20000 44 322 18199
$ python3 step4_plot.py --spatial-statistics 20000 44 322 18199

# Export every 100th snapshot to Tecplot for visualization.
$ python3 step5_export.py gems rom --trainsize 20000 --modes 44 --regularization 322 18199

The figures will be in ~/Desktop/combustion/figures/ and the Tecplot-friendly files will be in /storage/combustion/tecdata/.

Problem Statement: computational domain, state variables, and description of the data.

Installation and Setup: how to download the source code and the data files.

File Summary: short descriptions of each file in the repository.

Documentation: how to use the repository for reduced-order model learning.

Results: plots and figures, including many additional results that are not in the publications.

References: short list of primary references.

Clone this wiki locally