This document explains the data format used for the benchmark collection.
Providing a standardized way for specifying parameter estimation problems in systems biology, especially for the case of Ordinary Differential Equation (ODE) models.
This data format specifies a parameter estimation problems using a number of text-based files (Systems Biology Markup Language (SBML) and Tab-Separated Values (TSV)) , i.e.
-
An SBML model [SBML]
-
A measurement file to fit the model to [TSV]
-
A "condition" file specifying model inputs and condition-specific parameters [TSV]
The following sections will describe the minimum requirements of those components in the core standard, which should provide all information for defining the parameter estimation problem.
Extensions of this format (e.g. additional columns in the measurement table) are possible and intended. However, those columns should provide extra information for example for plotting, or for more efficient parameter estimation, but they should not change the optimum of the optimization problem. Some optional extensions are described in the last section, "Extensions", of this document.
The model must be specified as valid SBML. Since parameter estimation is beyond the scope of SBML, there exists no standard way to specify observables (model outputs) and respective noise models. Therefore, we use the following convention.
In the SBML model, observables are specified as AssignmentRules
assigning to parameters with id
s starting with observable_
followed by the observableId
as in the corresponding column of the measurement sheet (see below).
E.g.
observable_pErk = observableParameter1 + observableParameter2*pErk
where observableParameter1
would be an offset, and observableParameter2
a scaling parameter.
Measurement noise can be specified as a numerical value in the noiseParameters
column of the measurement sheet (see below), which will default to a Gaussian noise model with standard deviation as provided in noiseParameters
.
Alternatively, more complex noise models can be specified for each observable, using additional AssignmentRules
. Those noise model rules assign to sigma_${observableId}
parameters.
A noise model which accounts for relative and absolute contributions could, e.g., be defined as
sigma_pErk = noiseParameter1 + noiseParameter2*pErk
with noiseParameter1
denoting the absolute and noiseParameter2
the relative contribution.
Any parameters named noiseParameter${1..n}
must be overwritten in the noiseParameters
column of the measurement file (see below).
The condition table species parameters for specific simulation condition (generally corresponding to different experimental conditions).
This is specified as tab-separated value file with condition-specific parameters in the following way:
conditionId | conditionName | parameterId1 | ... | parameterId${n} |
---|---|---|---|---|
conditionId1 | conditionName1 | NUMERIC|parameterId | ... | ... |
conditionId2 | ... | ... | ... | ... |
... | ... | ... | ... | ... |
Row names are condition names as referenced in the measurement table below. Column names are parameter names as given in the SBML model. These parameters will override any parameter values specified in the model.
Row- and column-ordering are(?) arbitrary. The conditionName
column is optional. Additional columns are|are not allowed.
TODO This will no longer be the input table, what else is allowed here?
TODO Do we allow specifying initial conditions anywhere?
TODO Are additional columns allowed here? With special prefix?
TODO Related; Should Column-order matter?
TODO Do we allow non-sbml-model parameters here, as introduced in noiseParameters or observable parameters?
TODO: do we allow state names here, which will overwrite their dynamics?
A tab-separated values files containing all measurements to be used for model training or validation.
Expected to have the following named columns in any (but preferably this) order:
observableId | preequilibrationConditionId | simulationConditionId | measurement | ... |
---|---|---|---|---|
observableId | [conditionId] | conditionId | NUMERIC | |
... | ... | ... | ... | ... |
(wrapped for readability)
... | time | observableParameters | noiseParameters | observableTransformation |
---|---|---|---|---|
... | NUMERIC|'inf' | [parameterId|NUMERIC[;parameterId|NUMERIC][...]] | [parameterId|NUMERIC[;parameterId|NUMERIC][...]] | ['lin'(default)|'log'|'log10'] |
... | ... | ... | ... | ... |
Fields in "[]" in the second row are optional and may be left empty.
Additional (non-standard) columns may be added.
TODO All entries case-sensitive? column names, string values, sbml ids ?
-
observableId
[STRING, NOT NULL, REFERENCES(sbml.observableID)]Observable name matching the ID of the respective parameter in the SBML model (
observable_...
)TODO What is the observable ID: observable_bla or only the latter part?
-
preequilibrationConditionId
[STRING OR NULL, REFERENCES(conditionsTable.conditionID)]The
conditionId
to be used for preequilibration. E.g. for drug treatments the model would be preequilibrated with the no-drug condition. Empty for no preequlibration.TODO Can preequilibrationConditionId column be omitted if all empty?
-
simulationConditionId
[STRING, NOT NULL, REFERENCES(conditionsTable.conditionID)]conditionId
as provided in the condition table, specifying the condition-specific parameters used for simulation. -
measurement
[NUMERIC, NOT NULL]The measured value in the same units/scale as the model output.
-
time
[NUMERIC OR STRING, NOT NULL]Time point of the measurement in the time unit specified in the SBML model, numeric value or
inf
(lower-case) for steady-state measurements. -
observableParameters
[STRING OR NULL]This field allows overriding or introducing condition-specific versions of the parameters defined in the model. The model can define observables (see above) containing place-holder parameters which can be replaced by condition-specific dynamic or constant parameters.
Placeholder parameters must be named ... TODO (with
n
ranging from 0?1? to n, without gaps). If the observable specified underobservableId
contains no placeholder, this field must be empty. If it containsn > 0
placeholders, this field must holdn
semicolon-separated numeric values or parameter names. No trailing semicolon should be added.Different lines for the same
observableId
may specify different parameters. This may be used to account for condition-specific or batch-specific parameters. This will translate into an extended optimization parameter vector.TODO Optional if there are not observables with placeholders?
-
noiseParameters
[STRING]The measurement standard deviation or
NaN
if the corresponding sigma is a model parameter.TODO first finalize observableParameters
-
observableTransformation
[STRING]Transformation of the observable.
lin
,log
orlog10
.TODO Can observableTransformation column be omitted if all lin?
TODO What does this imply?
TODO Do we allow for custom cost functions or likelihood for non-normal case?
A tab-separated value text file containing information on the parameters. One row per parameter with arbitrary order of rows and columns:
parameterId | parameterName | parameterScale | lowerBound | upperBound | nominalValue | estimate | priorType | priorParameters |
---|---|---|---|---|---|---|---|---|
STRING | STRING | log10|lin| TODO | NUMERIC | NUMERIC | NUMERIC | 0|1 | TODO | TODO |
... | ||||||||
... |
Additional columns may be added.
-
parameterId
[STRING, NOT NULL, REFERENCES(sbml.parameterId)]The
parameterId
of the parameter described in this row. This has be identical to the parameter IDs specified in the SBML model or in theobservableParameters
ornoiseParameters
column of the measurement table (see above).There must exist one line for each parameterId specified in the SBML model (except for placeholder parameter, see above) or the
observableParameters
ornoiseParameters
column of the measurement table. -
parameterName
[STRING, OPTIONAL]Parameter name to be used e.g. for plotting etc. Can be chosen freely. May or may not coincide with the SBML parameter name.
-
parameterScale
[lin|log|log10]Scale of the parameter. The parameters and boundaries and the nominal parameter value in the following fields are expected to be given in this scale.
TODO optional, NULL==lin?
-
lowerBound
[NUMERIC]Lower bound of the parameter used for optimization.
TODO allow inf or empty?
-
upperBound
[NUMERIC]Upper bound of the parameter used for optimization.
-
nominalValue
Best values found by optimization or the value used for simulation for artificial data or fixed parameters (see
estimate
).TODO "Best values found by optimization": so they will be updated every time we find something better?
-
estimate
[BOOL 0|1]1 or 0, depending on, if the parameter is estimated (1) or set to a fixed value(0) (->
nominalValue
). -
priorType
Type of prior. Leave empty, if no priors. Normal/ Laplace etc.
TODO optional?
TODO what will be allowed?
-
priorParameters
Parameters for prior.
TODO Numeric or also parameter names?
TODO:
Extra columns
-
hierarchicalOptimization
(optional)hierarchicalOptimization: 1 if parameter is optimized using hierarchical optimization approach. 0 otherwise.