-
Notifications
You must be signed in to change notification settings - Fork 1
Declaration Language
The declaration language refers to the language employed by the WRES to declare the contents of an evaluation.
The simplest possible evaluation contains the paths to each of two datasets whose values will be compared or evaluated:
observed: observations.csv
predicted: predictions.csv
In this example, the two datasets contain time-series values in CSV format and they are located in the user’s home directory, otherwise absolute paths must be declared. The WRES will automatically detect the format of the supplied datasets.
In this example, the WRES will make some reasonable choices about other aspects of the evaluation, such as the metrics to compute (depending on the data it sees) and the statistics formats to write.
The language of “observed” and “predicted” is simply intended to clarify
the majority use case of the WRES, which is to compare predictions and
observations. When computing error values, the order of calculation is
to subtract the observed
values from the predicted
values. Thus, a
negative error means that the predictions are too low and a positive
error means they are too high. Beyond this, the WRES is agnostic about
the content or origin of these datasets and simply views them as two
time-series datasets. For example, observed or measured values could be
used in both the observed
and predicted
slots, if desired.
An evaluation is declared to WRES in a prescribed format and with a prescribed grammar. The format or “serialization format” is YAML, which is a recursive acronym, “YAML Ain’t Markup Language”. The evaluation language itself builds on this serialization format and contains the grammar understood by the WRES software. For example, datasets can be declared, together with any optional filters, metrics or statistics formats to create.
It may be interesting to know that YAML is a superset of JSON, which means that any evaluation declared to WRES using YAML has an equivalent declaration in JSON, which the WRES will accept. For example, the equivalent minimal evaluation in JSON is:
{
"observed": "observations.csv",
"predicted": "predictions.csv"
}
As you can see, YAML tends to be cleaner and more human readable than JSON, but JSON is perfectly acceptable if you are familiar with it and prefer to use it.
If you are curious, the following resources provide some more information about YAML:
- https://en.wikipedia.org/wiki/YAML [comprehensive description and examples]
- https://www.yamllint.com/ [this will tell you whether you declaration is valid YAML]
As indicated above, the basic datasets to compare are observed
and
predicted
:
observed: observations.csv
predicted: predictions.csv
Additionally, a baseline
dataset may be declared as a benchmark for
the predicted
dataset.
observed: observations.csv
predicted: predictions.csv
baseline: baseline_predictions.csv
For example, when computing a mean square error skill score
, the
mean square error
is first computed by comparing the predicted
and
observed
datasets and then, separately, by comparing the baseline
and observed
datasets and then, finally, by comparing the two
mean square error
scores in a skill score.
As of v6.22, covariate datasets can be used to filter evaluation pairs. For example, precipitation forecasts may be evaluated conditionally upon observed temperatures (a covariate) being at or below freezing. Further information about covariates is available here: Using covariates as filters.
In the simplest case, involving a single covariate dataset without
additional parameters, the covariate may be declared in the same way as
other datasets (note the plural form, covariates
, because there may be
one or more):
observed: observations.csv
predicted: predictions.csv
baseline: baseline_predictions.csv
covariates: covariate_observations.csv
In this example, the evaluation pairs will include only those valid times when the covariate is also defined.
Unlike the observed
, predicted
and baseline
datasets, more than
one covariate may be declared using a list. For example:
observed: observations.csv
predicted: predictions.csv
baseline: baseline_predictions.csv
covariates:
- sources: precipitation.csv
variable: precipitation
- sources: temperature.csv
variable: temperature
In this case, the list includes two covariates, one that contains precipitation observations and one that contains temperature observations.
Covariates may be declared with a minimum
and/or maximum
value. This
will additionally filter evaluation pairs to only those valid times when
the covariate meets the filter condition(s). For example:
observed: observations.csv
predicted: predictions.csv
baseline: baseline_predictions.csv
covariates:
- sources: precipitation.csv
variable: precipitation
minimum: 0.25
- sources: temperature.csv
variable: temperature
maximum: 0
In this case, the evaluation pairs will include only those valid times
when the temperature is at or below freezing, 0°C, and the precipitation
equals or exceeds 0.25mm. The measurement units correspond to the unit
in which the covariate data is defined. Currently, it is not possible to
transform the measurement unit of a covariate prior to filtering. In
addition, the values must be declared at the evaluation time_scale
,
whether or not this is declared explicitly. For example, if the
evaluation is concerned with daily average streamflow, then each
covariate filter should be declared as a daily value. However, the time
scale function can be declared separately for each covariate using the
rescale_function
. For example:
observed:
sources: observations.csv
variable: streamflow
predicted:
sources: predictions.csv
variable: streamflow
covariates:
- sources: precipitation.csv
variable: precipitation
minimum: 0.25
rescale_function: total
- sources: temperature.csv
variable: temperature
maximum: 0
rescale_function: minimum
time_scale:
period: 24
unit: hours
function: mean
In this case, the subject of the evaluation is daily mean streamflow and the streamflow pairs will include only those valid times when the daily total precipitation exceeds 0.25mm and the minimum daily temperature is below freezing.
Otherwise, all of the parameters that can be used to clarify an
observed
or predicted
dataset can be used to clarify a covariate
dataset (see 5.4. How do I clarify the datasets to evaluate, such as
the variable to use?).
You can declare multiple datasets by listing them. In this regard, YAML
has two styles for collections, such as arrays, lists and maps. The
ordinary or “block” style includes one item on each line. For example,
if the observed
dataset contains several URIs, they may be declared as
follows:
observed: observed.csv
predicted:
- predictions.csv
- more_predictions.csv
- yet_more_predictions.csv
In this context, the dashes and indentations are important to preserve. You should use two spaces for each new level of indentation, as in the example above.
Alternatively, you may use the “flow” style, which places all items in a continuous list or array and uses square brackets to begin and end the list:
observed: observed.csv
predicted: [predictions.csv, more_predictions.csv, yet_more_predictions.csv]
In some cases, it may be necessary to clarify the datasets to evaluate. For example, if a URI references a dataset that contains multiple variables, it may be necessary to clarify the variable to evaluate. In other cases, it may be necessary to clarify the time zone offset associated with the time-series or to apply additional parameters that filter data from a web service request.
When clarifying a property of a dataset, it is necessary to distinguish it from the other properties. For example, if a URI refers to a dataset that contains some missing values and the missing value identifier is not clarified by the source format itself, then it may be necessary to clarify this within the declaration:
observed:
- uri: some_observations.csv
missing_value: -999.0
- more_predictions.csv
predicted: some_predictions.csv
Here, the some_observations.csv
has now acquired a uri
property, in
order to distinguish it from the missing_value
.
Likewise, it may be necessary to clarify some attribute of a dataset as
a whole, such as the variable to evaluate (which applies to all sources
of data within the dataset). In that case, it would be further necessary
to distinguish the data sources
from the variable
:
observed:
sources:
- uri: some_observations.csv
missing_value: -999.0
- more_predictions.csv
variable: streamflow
predicted: some_predictions.csv
The following table contains the options that may be used to clarify
either an observed
or predicted
dataset as of v6.14, with examples
in context. You can also examine the schema, Does the declaration
language use a schema?,
which defines the superset of all possible evaluations supported by
WRES.
Option | Purpose | Examples in context |
---|---|---|
sources |
To clarify the list of sources to evaluate when other options are present for the dataset as a whole. | observed: |
uri |
To clarify the URI associated with a dataset when other options are present for the dataset associated with that URI. | observed: |
variable |
To clarify the variable to evaluate when a data source contains multiple variables. Optionally, one or more variable aliases may be included, which will be treated as equivalent to the named variable. |
observed: observed: |
feature_authority |
To clarify the feature authority used to name features. This may be required when correlating feature names across datasets. For example, to correlate a USGS Site Code of 06893000 with a National Weather Service "Hankbook 5" feature name of KCDM7 , it is either necessary to explicitly correlate these two names in the declaration or it is necessary to use one of the names and to resolve the correlated feature with a feature service request. For this request to succeed, the feature service will need to know that 06893000 is a usgs site code or, equivalently, that the KCDM7 is an nws lid . The supported values for the feature_authority are:- nws lid ,- usgs site code ,- nwm feature id ; and- custom , which is the default. |
observed: |
type |
In rare cases, it may be necessary to clarify the type of dataset. For example, when requesting time-series datasets from web services that support multiple types of data, it may be necessary to clarify the type of data required. The supported values for the type are:- ensemble forecasts ,- single valued forecasts ,- observations ,- simulations , and- analyses . |
observed: some_observations.csv |
label |
A user-friendly label for the dataset, which will appear in the statistics formats, where appropriate. | observed: some_observations.csv |
ensemble_filter |
A filter that selects a subset of the ensemble forecasts to include in the evaluation or exclude from the evaluation. Only applies to datasets that contain ensemble forecasts. By default, the named members are included. |
observed: some_observations.csv observed: some_observations.csv |
time_shift |
A time shift that is applied to the valid times associated with all time-series values. This may be used to to help pair values whose times are not exactly coincident. | observed: |
time_scale |
The timescale associated with the time-series values. This may be necessary when the timescale is not explicitly included in the source format. In general, a time-scale is only required when the time-series values must be rescaled in order to form pairs. For example, if the observed dataset contains instantaneous values and the predicted dataset contains values that represent a 6-hour average, then the observed time-series values must be "upscaled" to 6-hourly averages before they can be paired with their corresponding predicted values. Upscaling to a desired time scale is only possible if the existing timescale is known/declared. |
observed: |
time_zone_offset |
The time zone offset associated with the dataset. This is only necessary when the source format does not explicitly identify the time zone in which the timestamps are recorded. Accepts either a quantitative time zone offset or, less precisely, a time zone shorthand, such as CST (Central Standard Time). When using a numeric offset, the value must be enclosed within single or double quotes to clarify that it should be treated as a time zone offset and not a number. |
observed: |
The following table contains the additional options that may be used to
clarify a baseline
dataset as of v6.14, with examples in context. For
the avoidance of doubt, these options extend the options available for
an observed
or predicted
dataset.
Option | Purpose | Examples in context |
---|---|---|
persistence |
Allows for the declaration of a persistence baseline from a prescribed data source. The persistence time-series will be generated using the specified order or "lag", which corresponds to the value before the current time that will be persisted forwards into the future. For example, "1" means that the value from the persistence source that occurs one timestep prior to the current time will be persisted forwards. In this context, "current time" means the valid time of a non-forecast source or the reference time of a forecast source. The default value for the order is 1. |
observed: some_observations.csv observed: some_observations.csv |
climatology |
Allows for the declaration of a climatology baseline from a prescribed data source. For a given valid time, the climatology will contain the value from the prescribed data source at the corresponding valid time in each historical year of record, other than the year associated with the valid time (which is typically the "verifying observation"). The period associated with the climatology may be further constrained by a minimum_date and/or a maximum_date . Optionally, the climatology may be converted to a single-valued dataset by prescribing an average . The supported values for the average are:- mean ; and- median. The default value for the average is mean . |
observed: some_observations.csv observed: some_observations.csv |
separate_metrics |
A flag (true or false ) that indicates whether the same metrics computed for the predicted dataset should also be computed for the baseline dataset. When true , all metrics will be computed for the baseline dataset, otherwise the baseline will only appear in skill calculations for the predicted dataset. |
observed: some_observations.csv |
The following table contains the additional options that may be used to
clarify covariates
as of v6.23, with examples in context. For the
avoidance of doubt, these options extend the options available for an
observed
or predicted
dataset.
Option | Purpose | Examples in context |
---|---|---|
minimum |
Allows for the declaration of a minimum value the covariate should take. Only those evaluation pairs will be considered when the covariate value is at or above the minimum value at the same valid time. The measurement unit is the unit in which the covariate dataset is supplied. The time scale is the evaluation time scale. |
observed: some_observations.csv |
maximum |
Allows for the declaration of a maximum value the covariate should take. Only those evaluation pairs will be considered when the covariate value is at or below the maximum value at the same valid time. The measurement unit is the unit in which the covariate dataset is supplied. The time scale is the evaluation time scale. |
observed: some_observations.csv |
rescale_function |
A function to use when rescaling the covariate dataset to the evaluation time scale. | observed: some_observations.csv |
In general, evaluation input data can be either
- requested from a web-service, or
- available on the local files system with the WRES having read permissions.
The following web services may be accessible options for obtaining input data:
- USGS Observations: The WRES can pull observations directly from the USGS National Water Information System (NWIS) where such observations are available. Instructions for configuration are provided via Example 5 within Complete Examples of Evaluation Declarations. Not only is WRES not responsible for the accuracy of data, but also not responsible for availability of data services nor data availability within those services. One may subscribe to NWIS service announcements at https://listserv.usgs.gov/mailman/listinfo/nwisweb-notification
- Recent NWM data via WRDS Services: If you have access to WRDS services (see note, below), the WRES can read recent (generally, 90 days) NWM data directly from services developed by the OWP WRDS team.
- AHPS Forecasts via WRDS API: If you have access to WRDS services (see note, below), a decades long archive of AHPS forecast data is available to support a WRES evaluation.
NOTE: WRDS services are only accessible from within the NWC network. If you are on that network OR you are using the Central OWP WRES, then you should have access to those services; see the NOAA VLab project WRES User Support wiki for information on how to declare use of WRDS services.
Files provided to the WRES must be in one of the following formats:
- WRES-Compliant CSV Files: The files must follow a specific format, described at Format Requirements for CSV Files.
- CHPS/FEWS PI-timeseries XML files: The files can be gzipped (i.e., *.xml.gz) or tarred and gzipped (see compressed archives, below). However, they may not be gzipped XML files that are then tarred (i.e., a .tar containing .xml.gz is not allowed).
- Fast-Infoset encoded CHPS/FEWS PI-timeseries XML files: The files can be gzipped (i.e., *.fi.gz) or tarred and gzipped (see compressed archives, below). However, they may not be gzipped files that are then tarred (i.e., a .tar containing .fi.gz is not allowed).
- NWS datacard format files: This format is allowed for observed or simulation data only. It is highly recommended that this format be avoided if possible. It insufficiently describes the data contained, therefore requiring specification of declaration that other formats do not.
- WRDS-JSON Format Files: WRDS services will use a JSON format for data interchange. The WRES can also read data formatted following WRDS-JSON from flat files.
- NWM v1.1 - v3.0 compliant netCDF files: To be considered NWM-compliant, the NetCDF must include the expected metadata. For more information, see Format Requirements for NetCDF Files.
- NWM data available on-line: In general, the WRES can read NWM netCDF files from any online location so long as (1) WRES has access and (2) the files are organized identically to what is found in the NOMADS; again, see Configuring the Raw NWM Data Source. For example, it can access data provided through the @para.nomads.ncep.noaa.gov@ website: https://para.nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/para/. NOTE: If you have access to NWC resources, then you can obtain NWM data from the NWC D-Store; see the NOAA VLab WRES User Support project wiki for more information).
- USGS JSON (WaterML): The WRES can read files in USGS-style, JSON, WaterML format. WaterML is described "here":https://www.ogc.org/standards/waterml.
The following compressed archives of files can be read:
- Tarred/Gzipped Archives: The WRES can read archives of tarred/gzipped (e.g., .tgz or .tar.gz) files following any of the formats mentioned above with the exception of raw NWM data.
- Gzipped Data: The WRES can read gzipped (e.g., .gz) files following any of the formats mentioned above with the exception of raw NWM data.
Geographic features may be declared explicitly by listing each feature to evaluate (How do I declare a list of geographic features to evaluate?). Alternatively, they may be declared implicitly, either by declaring a named region to evaluate (How do I declare a region to evaluate without listing all of the features within it?) or by declaring a geospatial mask (How do I declare a spatial mask to evaluate only a subset of features?).
There are three scenarios in which you should declare the geographic features to evaluate, namely:
- When the declared datasets contain more features than you would like to evaluate, i.e., when you would like to evaluate a subset of the features for which data is available;
- When you are reading data from a web-service, otherwise the evaluation would request a potentially unlimited amount of data; or
- When there are multiple geographic features present within the declared datasets and two or more of the datasets use a different feature naming authority. In these circumstances, it is necessary to declare how the features are correlated with each other.
If you fail to declare the geographic features in any of these scenarios, you can expect an error message.
Conversely, it is unnecessary to declare the geographic features to evaluate when:
- There is a single geographic feature in each dataset; or
- There are multiple geographic features and:
- All of the datasets use a consistent feature naming authority; and
- The evaluation should include all of the features discovered.
Different datasets may name geographic features differently. Formally speaking, they may use different “feature authorities”. For example, time-series data from the USGS National Water Information System (NWIS) uses a USGS Site Code, whereas time-series data from the National Water Model uses a National Water Model feature ID.
As such, the software allows for as many feature names as sides of data,
i.e., three (observed
, predicted
and baseline
). This is referred
to as a “feature tuple”.
When all sides of data have the same feature authority, and this can be established from the other declaration present, it is sufficient to declare the name for only one side of data. Otherwise, the fully qualified feature tuple must be declared or a feature service used to establish the missing names.
In the simplest case, where each side of data has the same feature authority and the aim is to pair corresponding feature names, the features may be declared as follows:
features:
- DRRC2
- DOLC2
Where DRRC2
and DOLC2
are the names of two geographic features in
the National Weather Service “Handbook 5” feature authority. In this
example, the evaluation will produce statistics separately for each of
DRRC2
and DOLC2
.
In the more complex case, where each side of data has a separate feature authority or the feature authority cannot be determined from the other information present, then the features must be declared separately for each side of data, as follows:
features:
- {observed: '07140900', predicted: '21215289'}
- {observed: '07141900', predicted: '941030274'}
In this example, the feature authority for the observed
data is a USGS
Site Code and the feature authority for the predicted
data is a
National Water Model feature ID. The quotes around the names indicate
that the values should be treated as characters, rather than numbers.
Yes. Often, this is unnecessary because the software can determine the feature authority from the other information present. For example, consider the following declaration:
observed:
sources:
- uri: https://nwis.waterservices.usgs.gov/nwis/iv
interface: usgs nwis
variable:
name: '00060'
predicted:
sources:
- uri: data/nwmVector/
interface: nwm short range channel rt conus
variable: streamflow
In this case, it is unambiguous that the observed
data uses a USGS
Site Code because the source interface
is usgs nwis
. Likewise, the
predicted
data uses a National Water Model feature ID because the
source interface
is a National Water Model type,
nwm short range channel rt conus
. In short, if the source interface
is declared, it should be unnecessary to define the geographic feature
authority.
In other cases, time-series data may be obtained from a file source whose metadata is unclear about the feature authority. In fact, none of the time-series data formats currently supported by WRES include information about the feature authority. In this case, the feature authority may be declared explicitly:
observed:
sources: data/DRRC2QINE.xml
feature_authority: nws lid
predicted:
sources: data/drrc2ForecastsOneMonth/
feature_authority: nws lid
The above unlocks the following as valid declaration:
observed:
sources: data/DRRC2QINE.xml
feature_authority: nws lid
predicted:
sources: data/drrc2ForecastsOneMonth/
feature_authority: nws lid
features:
- DRRC2
Conversely, in the absence of the declared feature_authority
for each
side of data, this would be required:
observed:
sources: data/DRRC2QINE.xml
predicted:
sources: data/drrc2ForecastsOneMonth/
features:
- {observed: DRRC2, predicted: DRRC2}
If you are using datasets with different feature authorities and are either unaware of how features relate to each other across the different feature authorities or prefer not to declare them manually, then you can use the Water Resources Data Service (WRDS) feature service to establish feature correlations. The WRDS is available to those with access to Office of Water Prediction web services.
A feature service may be declared as follows:
feature_service: https://[WRDS]/api/location/v3.0/metadata
Where [WRDS]
is the host name of the WRDS feature service.
The WRES can ask the WRDS feature service to resolve feature correlations, providing it knows how to pose the question correctly. To pose the question correctly, it must know the feature authority associated with each of the feature names that need to be correlated.
For example, consider the following declaration:
observed:
sources:
- uri: https://nwis.waterservices.usgs.gov/nwis/iv
interface: usgs nwis
variable:
name: '00060'
predicted:
sources:
- uri: data/nwmVector/
interface: nwm short range channel rt conus
variable: streamflow
feature_service: https://[WRDS]/api/location/v3.0/metadata
features:
- observed: '07140900'
- observed: '07141900'
In this case, the feature authority of the observed
data is a USGS
Site Code (the interface
is usgs nwis
) and the feature authority of
the predicted
data is a National Water Model feature ID. This allows
the WRES to pose a valid question to the WRDS feature service, namely
“what are the National Water Model feature IDs that correspond to USGS
Site Codes ‘07140900’ and ‘07141900’?”. It is important to note that
each feature must be qualified as observed
because the feature names
are expressed as USGS Site Codes and the observed
data uses this
feature authority.
You may use the WRDS Feature Service to acquire a list of features for a named geographic region, such as a River Forecast Center (RFC). The WRDS is available to those with access to Office of Water Prediction web services.
For example, consider the following declaration, which requests all named features within the Arkansas-Red Basin RFC:
feature_service:
uri: https://[WRDS]/api/location/v3.0/metadata
group: RFC
value: ABRFC
Where [WRDS]
is the host name of the WRDS feature service. Here, the
name of the geographic group
understood by WRDS is RFC
and the
chosen value
is ABRFC
.
In this example, each of the geographic features contained within ABRFC, as understood by WRDS, would be included in the evaluation. To include features from multiple regions, simply list the individual regions. For example, to additionally includes features from the California Nevada RFC:
feature_service:
uri: https://[WRDS]/api/location/v3.0/metadata
- group: RFC
value: ABRFC
- group: RFC
value: CNRFC
By default, each geographic feature is evaluated separately. However, to
pool all of the geographic features together and produce a single set of
statistics for the overall group, the pool
attribute may be declared:
feature_service:
uri: https://[WRDS]/api/location/v3.0/metadata
- group: RFC
value: ABRFC
pool: true
Yes, you can declare a spatial mask that defines the geospatial boundaries for an evaluation. This requires a Well Known Text (WKT) string. For example:
spatial_mask: 'POLYGON ((-76.825 39.225, -76.825 39.275, -76.775 39.275, -76.775 39.225, -76.825 39.225))'
In this case, the evaluation will only include (e.g., gridded) locations that fall within the boundaries of the supplied polygon.
Optionally, you may name the region and include a Spatial Reference System Identifier (SRID), which unambiguously describes the coordinate reference system for the supplied WKT: https://en.wikipedia.org/wiki/Spatial_reference_system:
spatial_mask:
name: Region south of Ellicott City, MD
wkt: 'POLYGON ((-76.825 39.225, -76.825 39.275, -76.775 39.275, -76.775 39.225, -76.825 39.225))'
srid: 4326
You should filter the time-series data in either of these scenarios:
- When the goal is to evaluate only a subset of the available time-series data; or
- When reading data from a web service, otherwise the evaluation would request a potentially unlimited amount of data.
An evaluation may be composed of up to three timelines, depending on the type of data to evaluate:
- Valid times. These are the ordinary datetimes at which values are
recorded. For example, if streamflow is observed at
2023-03-25T12:00:00Z
, then its “valid time” is2023-03-25T12:00:00Z
. - Reference times. These are the times to which forecasts are referenced. In practice, there are different flavors of forecast reference times, such as forecast “issued times”, which may correspond to the times at which forecast products are released to the public, or “T0s”, which may correspond to the times at which a forecast model begins forward integration. However, as of v6.14, all reference times are considered par.
- Lead times. These are durations rather than datetimes and refer to
the period elapsed between a forecast reference time and a forecast
valid time. For example, if a forecast is issued at
2023-03-25T12:00:00Z
and valid at2023-03-25T13:00:00Z
, then its lead time is “1 hour”.
The last two timelines only apply to forecast datasets.
Datetimes are always declared using an ISO8601 datetime string in Coordinate Universal Time (UTC), aka Zulu (Z) time. Further information about ISO8601 can be found here: https://en.wikipedia.org/wiki/ISO_8601
Each of these timelines can be constrained or filtered so that the evaluation only considers data in between the prescribed datetimes. These bounds always form an open interval, meaning that times that fall exactly on either boundary are included.
Consider the following declaration of a valid time interval:
valid_dates:
minimum: 2017-08-07T23:00:00Z
maximum: 2017-08-09T17:00:00Z
In this case, the evaluation will consider all time-series values whose
valid times are between 2017-08-07T23:00:00Z
and
2017-08-09T17:00:00Z
, inclusive.
The following is also accepted:
valid_dates:
minimum: 2017-08-07T23:00:00Z
In this case, there is a lower bound or minimum
date, but no upper
bound, so the evaluation will consider time-series values whose valid
times occur on or after 2017-08-07T23:00:00Z
.
A reference time interval may be declared in a similar way:
reference_dates:
minimum: 2017-08-07T23:00:00Z
maximum: 2017-08-08T23:00:00Z
Finally, lead times may be constrained like this:
lead_times:
minimum: 0
maximum: 18
unit: hours
In this example, the evaluation will only consider forecast values whose lead times are between 0 hours and 18 hours, inclusive.
When using model analyses in an evaluation, these analyses are sometimes referenced to the model initialization time, which is a particular flavor of reference time. For example, the National Water Model can cycle for hourly periods prior to the forecast initialization time and produce an “analysis” for each period. These analysis durations may be constrained in WRES.
For example, consider the following declaration:
analysis_times:
minimum: -2
maximum: 0
unit: hours
In this case, the evaluation will consider analysis cycles that are less than 2 hours before the model initialization time, up to the initialization time of 0 hours.
The WRES allows for a seasonal evaluation to be declared through a
season
filter. The season
filter will apply to the valid times
associated with the pairs when both sides of the pairing contain
non-forecast sources (i.e., there are no reference times present);
otherwise it will apply to the reference times (i.e., when one or both
sides of the pairing contain forecasts).
A seasonal evaluation is declared with a minimum day and month and a maximum day and month. For example:
season:
minimum_day: 1
minimum_month: 4
maximum_day: 31
maximum_month: 7
In this example, the evaluation will consider only those pairs whose valid times (non-forecast sources) or reference times (forecast sources) fall between 0Z on 1 April and an instant before 0Z on 1 August (i.e., the very last time on 31 July).
The desired measurement units are declared as follows:
unit: m3/s
The unit may be any valid Unified Code for Units of Measure
(UCUM). In addition, the WRES will accept
several informal measurement units that are widely used in hydrology,
such as CFS (cubic feet per second, formal UCUM unit [ft_i]3/s
), CMS
(cubic meters per second, formal UCUM unit m3/s
) and IN (inches,
formal UCUM unit [in_i]
).
Further details on units of measurement can be found in a separate wiki, Units of measurement.
If a data source contains a measurement unit that is unrecognized by
WRES, you may receive an UnrecognizedUnitException
indicating that a
measurement unit alias should be defined. A measurement unit alias is a
mapping between an unrecognized or informal measurement unit, known as
an alias
, and a formal UCUM unit, known as a unit
. For example,
consider the following declaration:
unit: K
unit_aliases:
- alias: °F
unit: '[degF]'
- alias: °C
unit: '[cel]'
In this example, °F
and °C
are informal measurement units whose
corresponding UCUM units are [degF]
and [cel]
, respectively. The
desired measurement unit is, K
or kelvin. By declaring unit_aliases
,
the WRES will understand that any references to °F
should be
interpreted as formal unit, [degF]
and any references to °C
should
be interpreted as formal unit, [cel]
. This will allow the software to
convert the informal units of °F
and °C
, on the one hand, to the
formal unit of K
, on the other.
Further information about units of measurement and aliases can be found in a separate wiki, Units of measurement.
In some cases, it is necessary to omit values that fall outside a
particular range. For example, it may be desirable to only evaluation
precipitation forecasts whose values are greater than an instrument
detection limit. Restricting values to a particular range is achieved by
declaring the minimum
and/or maximum
values that the evaluation
should consider, as follows:
unit: mm
values:
minimum: 0.0
maximum: 100.0
In this example, only those values (observed
, predicted
and
baseline
) that fall within the range 0mm to 100mm will be considered.
The values are always declared in evaluation units. Mechanically
speaking, any values that fall outside this range will be assigned the
default missing value identifier.
Optionally, however, values that fall outside of the nominated range may be assigned another value. For example:
unit: mm
values:
minimum: 0.25
maximum: 100.0
below_minimum: 0.0
above_maximum: 100.0
In this example, values that are less than 0.25mm will be assigned a
value of 0mm (the below_minimum
value) and values above 100mm will be
assigned a value of 100mm (the above_maximum
value).
There are three flavors of thresholds that may be declared:
- Ordinary thresholds (
thresholds
), which are real-valued. If not otherwise declared, the thresholds values are assumed to be in the same measurement units as the evaluation; - Probability thresholds (
probability_thresholds
) whose values must fall within the interval [0,1]. These are converted into real-valued thresholds by finding the corresponding quantile of theobserved
dataset; and - Classifier thresholds (
classifier_thresholds
) whose values must fall within the interval [0,1]. These are used to convert probability forecasts into dichotomous (yes/no) forecasts.
The simplest use of thresholds may look like this, in context:
observed: some_observations.csv
predicted: some_forecasts.csv
unit: ft
thresholds: 12.3
In this case, the evaluation will consider only those pairs of
observed
and predicted
values where the observed
value exceeds
12.3 FT.
There are several other attributes that may be declared alongside the threshold value(s). For example, consider this declaration:
observed: some_observations.csv
predicted: some_forecasts.csv
unit: m
thresholds:
name: MAJOR FLOOD
values:
- { value: 23.0, feature: DRRC2 }
- { value: 27.0, feature: DOLC2 }
operator: greater equal
apply_to: predicted
unit: ft
In this example, the evaluation will consider only those pairs of
observed
and predicted
values at DRRC2
where the predicted
value
is greater than or equal to 23.0 FT and only those paired values at
DOLC2
where the predicted
value is greater than 27.0 FT. Further,
for both locations, this threshold will be labelled MAJOR FLOOD
. The
evaluation itself will be conducted in units of m
(meters), so these
thresholds will be converted from ft
to m
prior to evaluation.
The acceptable values for the operator
include:
-
greater
; -
greater equal
; -
less
; -
less equal
; and -
equal
.
The acceptable values for the apply_to
include:
-
observed
: include the pair when the condition is met for theobserved
value; -
predicted
: include the pair when the condition is met for thepredicted
value (orbaseline
predicted value for baseline pairs); -
observed and predicted
: include the pair when the condition is met for both theobserved
andpredicted
values (orbaseline
predicted value for baseline pairs); -
any predicted
: include the pair when the condition is met for any of thepredicted
values with an ensemble (orbaseline
predicted value for baseline pairs); -
observed and any predicted
: include the pair when the condition is met for both theobserved
value and for any of thepredicted
values within an ensemble (orbaseline
predicted value for baseline pairs); -
predicted mean
: include the pair when the condition is met for the ensemble mean of thepredicted
values (orbaseline
predicted value for baseline pairs); and -
observed and predicted mean
: include the pair when the condition is met for both theobserved
value and the ensemble mean of thepredicted
values (orbaseline
predicted value for baseline pairs).
The apply_to
is only relevant when filtering pairs for metrics that
apply to continuous variables, such as the mean error (e.g., of
streamflow predictions), and not when transforming pairs, such as
converting continuous pairs to probabilistic or dichotomous pairs. For
the latter, both sides of the pairing are always transformed, by
definition.
The probability thresholds and classifier thresholds may be declared in a similar way. For example:
observed: some_observations.csv
predicted: some_forecasts.csv
unit: ft
probability_thresholds: [0.1,0.5,0.9]
In this example, the evaluation will consider only those pairs of
observed
and predicted
values where the observed
value is greater
than each of the 10th, 50th and 90th percentiles of the observed
values.
All of the declaration options for thresholds that are applied to the evaluation as a whole can be applied equally to individual metrics within the evaluation, if desired. For example, consider the following declaration:
observed: some_file.csv
predicted: another_file.csv
unit: ft
metrics:
- name: mean square error skill score
thresholds: 23
- name: pearson correlation coefficient
probability_thresholds:
values: [0.1,0.2]
operator: greater equal
In this example, the mean square error skill score
will be computed
for those pairs of observed
and predicted
values where the
observed
value exceeds 23.0 FT. Meanwhile, the
pearson correlation coefficient
will be computed for those pairs of
observed
and predicted
values where the observed
value is greater
than or equal to the 10th percentile of observed
values and,
separately, the 20th percentile of observed
values.
Yes. An evaluation may declare thresholds from one or both of these external sources:
- The Water Resources Data Service (WRDS) threshold service; and
- Comma separate values from a file on the default filesystem.
For those users with access to the WRDS threshold service, the WRES will request thresholds from the WRDS when declared. Consider the following declaration:
observed:
sources: data/CKLN6_STG.xml
feature_authority: nws lid
predicted: data/CKLN6_HEFS_STG_forecasts.tgz
features:
- observed: CKLN6
threshold_sources: https://[WRDS]/api/location/v3.0/nws_threshold/
Where [WRDS]
is the host name for the WRDS production
service (to be inserted). Note the use of feature_authority
, which is
important in this context. In particular, it allows WRES to pose a
complete/accurate request to WRDS, namely “please provide the streamflow
thresholds associated with an NWS LID of CKLN6”. By default, the WRES
will request streamflow thresholds unless otherwise declared.
Consider a more complicated declaration:
observed:
sources:
- uri: https://nwis.waterservices.usgs.gov/nwis/iv
interface: usgs nwis
variable:
name: '00060'
predicted:
sources:
- uri: data/nwmVector/
interface: nwm short range channel rt conus
variable: streamflow
features:
- {observed: '07140900', predicted: '21215289'}
- {observed: '07141900', predicted: '941030274'}
threshold_sources:
uri: https://[WRDS]/api/location/v3.0/nws_threshold/
parameter: stage
provider: NWS-NRLDB
rating_provider: NRLDB
missing_value: -999.0
feature_name_from: predicted
In this example, the WRES will ask WRDS to provide all thresholds for
the parameter
of stage
, the provider
of NWS-NRLDB
, and the
rating_provider
of NRLDB
and for those geographic features with NWM
feature IDs of 21215289
and 941030274
. Furthermore, the evaluation
will consider any threshold values of –999.0 to be missing values.
Thresholds may be read from CSV files in a similar way to thresholds from the Water Resources Data Service (WRDS). For example, consider the following declaration:
threshold_sources: data/thresholds.csv
In this example, thresholds will be read from the path
data/thresholds.csv
on the default filesystem. By default, they will
be treated as ordinary, real-valued, thresholds in the same units as the
evaluation and for the same variable.
The options available to qualify thresholds from WRDS are also available to qualify thresholds from CSV files. For example, consider the following declaration:
threshold_sources:
- uri: data/thresholds.csv
missing_value: -999.0
feature_name_from: observed
- uri: data/more_thresholds.csv
missing_value: -999.0
feature_name_from: predicted
type: probability
In this example, thresholds will be read from two separate paths on the
default filesystem, namely data/thresholds.csv
and
data/more_thresholds.csv
. The thresholds from data/thresholds.csv
will be treated as ordinary, real-valued, thresholds whose feature names
correspond to the observed
dataset. Conversely, the thresholds from
data/more_thresholds.csv
will be treated as probability
thresholds
whose feature names correspond to the predicted
dataset. In both
cases, values of –999.0 are considered to be missing values.
By way of example, the CSV format should contain a location or
geographic feature identifier in the first column, labelled
locationId
, and one conceptual threshold per column in the remaining
columns, with each column header containing the name of that threshold,
if appropriate (otherwise blank), and each row containing a separate
location:
locationId, ACTION, MINOR FLOOD
CKLN6, 10, 12
WALN6, 7.5, 9.5
A “pool” is the atomic unit of paired data from which a statistic is computed. Typically, there are many pools of pairs in each evaluation. For example, considering pooling over time, or temporal pooling, if the goal is to evaluate a collection of forecasts at each forecast lead time, separately, and all of the forecasts contain 3-hourly lead times for 2 days, then there are 24/3*2=16 lead times and hence 16 pools of data to evaluate.
Pooling can be done temporally (over time) or spatially (over features), both of which are described here.
In general, an evaluation will require a regular sequence of pools along one or more of the timelines described in What timelines are understood by WRES and how do I constrain them?, namely:
- Valid times;
- Reference times (of forecasts); and
- Lead times (of forecasts).
There is a consistent grammar for declaring a regular sequence of pools
along each of these timelines. In each case, the sequence begins at the
minimum
value and ends at the maximum
value associated with the
corresponding timeline described in What timelines are understood by
WRES and how do I constrain
them?.
For the same reason, a sequence of pools requires both a constraint on
the timeline and the pool sequence itself. For example:
reference_dates:
minimum: 2023-03-17T00:00:00Z
maximum: 2023-03-19T19:00:00Z
reference_date_pools:
period: 13
unit: hours
In this example, there is a regular sequence of reference time pools.
The sequence begins at 2023-03-17T00:00:00Z
and ends at
2023-03-19T19:00:00Z
, inclusive. Each pool is 13 hours wide and a new
pool begins every 13 hours. In other words, the pools are not
overlapping, by default. Using interval notation, the above declaration
would produce the following sequence of pools where (
means that the
lower boundary is excluded and ]
means that the upper boundary is
included:
- Pool
rp1
: (2023-03-17T00:00:00Z, 2023-03-17T13:00:00Z] - Pool
rp2
: (2023-03-17T13:00:00Z, 2023-03-18T02:00:00Z] - Pool
rp3
: (2023-03-18T02:00:00Z, 2023-03-18T15:00:00Z] - Pool
rp4
: (2023-03-18T15:00:00Z, 2023-03-19T04:00:00Z] - Pool
rp5
: (2023-03-19T04:00:00Z, 2023-03-19T17:00:00Z]
Note that there is no “Pool 6” because a pool cannot partially overlap
the minimum
or maximum
dates on the timeline.
If we assume that four separate forecasts were issued, beginning at
2023-03-17T00:00:00Z
and repeating every 12 hours, then the timeline
may be visualized as follows, where fc
is a forecast whose reference
time is denoted 0
and rp
is a reference date pool:
fc1: 0 v v v v v v v v v v v v v v v v
fc2: 0 v v v v v v v v v v v v v v v v
fc3: 0 v v v v v v v v v v v v v v v v
fc4: 0 v v v v v v v v v v v v v v v v
time: ─┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼───
16th 17th 17th 17th 17th 18th 18th 18th 18th 19th 19th 19th 19th 20th 20th 20th 20th 21st
18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z
boundaries: ├ ┤
rp1: └────────────┘ rp3: └────────────┘ rp5: └────────────┘
rp2: └────────────┘ rp4: └────────────┘
In this example, fc1
would fall in pool rp1
, fc2
would fall in
pool rp2
, and so on. Pool rp5
would contain no data because there
are no reference times that fall within it.
A regular sequence of valid time pools or lead time pools may be declared in a similar way. For example, the equivalent pools by valid time are:
valid_dates:
minimum: 2023-03-17T00:00:00Z
maximum: 2023-03-19T19:00:00Z
valid_date_pools:
period: 13
unit: hours
A similar sequence of lead time pools may be declared as follows:
lead_times:
minimum: 0
maximum: 44
unit: hours
lead_time_pools:
period: 13
unit: hours
Yes, pools may overlap or underlap each other; in other words, the pool
boundaries may not abut perfectly. This is achieved by declaring a
frequency
, which operates alongside the period
. For example:
reference_dates:
minimum: 2023-03-17T00:00:00Z
maximum: 2023-03-19T19:00:00Z
reference_date_pools:
period: 13
frequency: 7
unit: hours
In this case, a new reference time pool will begin every 7 hours and each pool will be 13 hours wide. To continue the above example and visualization:
fc1: 0 v v v v v v v v v v v v v v v v
fc2: 0 v v v v v v v v v v v v v v v v
fc3: 0 v v v v v v v v v v v v v v v v
fc4: 0 v v v v v v v v v v v v v v v v
time: ─┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼───
16th 17th 17th 17th 17th 18th 18th 18th 18th 19th 19th 19th 19th 20th 20th 20th 20th 21st
18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z
boundaries: ├ ┤
rp1: └────────────┘ rp4: └────────────┘ rp7: └────────────┘
rp2: └────────────┘ rp5: └────────────┘ rp8: └────────────┘
rp3: └────────────┘ rp6: └────────────┘
Here, pools rp1
through rp7
each contain one forecast and pool rp8
contains no forecasts.
An evaluation answers a question (e.g., about forecast quality). When that question is concerned with a geographic area or region, it may be appropriate to gather and pool together data from several geographic features. However, there will be cases where pooling over geographic features is inappropriate, such as if evaluation land surface variables where evaluation results may vary significantly between features.
The why and how of pooling over geographic features is described in Pooling geographic features.
The desired timescale associated with the evaluation is declarative, which means that it may be different than the timescale of the existing datasets. However, the WRES currently only supports limited forms of “upscaling” (increasing the timescale of existing datasets) and does not support “downscaling” (reducing the timescale of existing datasets). More information about the timescale and rescaling can be found here: Time Scale and Rescaling Time Series.
A fixed timescale contains three elements, namely:
- The
period
, which is the number of time units to which the value applies; - The time
unit
associated with theperiod
. Supported values include:seconds
minutes
-
hours
; and -
days
; and
- The
function
, which describes how the value is distributed over theperiod
. Supported values include:-
mean
; -
minimum
; -
maximum
; and -
total
.
-
For example, to declare a desired timescale that represents a mean average value over a 6 hour period, use the following:
time_scale:
function: mean
period: 6
unit: hours
Yes. The desired timescale can span an explicit period
that begins or
ends on a particular date or an implicit (and potentially varying)
period that begins and ends on nominated dates. For example, to declare
a timescale that represents a maximum value that occurs between 0Z on 1
April and the instant before 0Z on 1 August (i.e., the end of 31 July),
declare the following:
time_scale:
function: maximum
minimum_day: 1
minimum_month: 4
maximum_day: 31
maximum_month: 7
More information and examples can be found here: Time Scale and Rescaling Time Series.
In principle, you don’t. Recall the simplest possible evaluation described in What is the simplest possible evaluation I can declare?:
observed: observations.csv
predicted: predictions.csv
When no metrics are declared explicitly, the software will read the time-series data and evaluate all metrics that are appropriate for the types of data discovered. For example, if one of the data sources contains ensemble forecasts, then the software will include all metrics that are appropriate for ensemble forecasts.
While the metrics can be chosen by the software, it is often desirable to calculate only a subset of the metrics that are technically valid for a given type of data. A list of metrics may be declared as follows:
metrics:
- sample size
- mean error
- mean square error
The list of supported metrics is provided here: List of metrics available.
In rare cases, it may be necessary to declare parameter values for some metrics. For example, if graphics formats are required for some metrics and not others, you can indicate that specific graphics formats should be omitted for some metrics:
metrics:
- sample size
- mean error
- name: ensemble quantile quantile diagram
png: false
svg: false
- mean square error
In this example, the png
and svg
graphics formats would be omitted
for the ensemble quantile quantile diagram
. Note that, in order to
distinguish the metric name
from the parameter values, the name
key
is now declared explicitly for the ensemble quantile quantile diagram
,
but is not required for the other metrics, as they do not have
parameters.
The list of currently supported parameter values are tabulated below.
Parameter | Applicable metrics | Purpose | Example in context |
---|---|---|---|
png |
All. | A flag that allows for Portable Network Graphics (PNG) to be turned on (true ) or off (false ). |
metrics: |
svg |
All. | A flag that allows for Scalable Vector Graphics (SVG) to be turned on (true ) or off (false ). |
metrics: |
thresholds |
All. | Allows thresholds to be declared for a specific metric (rather than all metrics). To ensure that the metric is computed for the superset of pairs or "all data" only, and not for any other declared thresholds, you may use thresholds: all data. |
metrics: |
probability_thresholds |
All. | Allows probability_thresholds to be declared for a specific metric (rather than all metrics). |
metrics: |
classifier_thresholds |
All dichotomous metrics (e.g., probability of detection ). |
Allows classifier_thresholds to be declared for a specific, dichotomous metric (rather than all dichotomous metrics). |
metrics: |
ensemble_average |
All single-valued metrics as they relate to ensemble forecasts (e.g., mean error ). |
A function to use when deriving a single value from an ensemble of values. For example, to calculate the ensemble mean, the ensemble_average should be mean . The supported values are:- mean - median
|
metrics: |
summary_statistics |
All time-series metrics (e.g., time to peak error ). |
A collection of summary statistics to calculate from the distribution of time-series errors. For example, when calculating the time to peak error , there is one error value for each forecast and hence a distribution of errors across all forecasts. When declaring the median in this context, the median time to peak error will be reported alongside the distribution of errors. The supported values are:- mean - median - minimum - maximum - mean absolute - standard deviation
|
metrics: |
Summary statistics can be used to describe or summarize a broader collection of evaluation statistics, such as the statistics associated with all geographic features in an evaluation. Further information about summary statistics is available here: Evaluation summary statistics.
Summary statistics are declared as a list of summary_statistics
. For
example:
summary_statistics:
- mean
- standard deviation
By default, summary statistics are calculated across all geographic
features. Optionally, the dimensions
to summarize may be declared
explicitly. For example:
summary_statistics:
statistics:
- mean
- standard deviation
dimensions:
- features
- feature groups
In this example, the features
option indicates that summary statistics
should be calculated for all geographic features within the evaluation.
These features may be declared explicitly as features
or using a
feature_service
with one or more group
whose pool
option is set to
“false” or they may be declared implicitly with sources
that contain
time-series data for named features. In addition, the feature groups
option indicates that summary statistics should be calculated for each
geographic feature group separately. These feature groups may be
declared as feature_groups
or using a feature_service
with one or
more group
whose pool
option is set to “true”. When declaring
summary statistics for feature groups
, one or more feature groups must
also be declared.
A few of the summary statistics support additional parameters, notably
the quantiles
and the histogram
. In that case, the statistic name
must be qualified separately from the parameters. For example:
summary_statistics:
statistics:
- mean
- median
- minimum
- maximum
- standard deviation
- mean absolute
- name: quantiles
probabilities: [0.05,0.5,0.95]
- name: histogram
bins: 5
- box plot
The default probabilities
associated with the quantiles
are 0.1,
0.5, and 0.9. The default number of bins
in the histogram
is 10.
The sampling uncertainties may be estimated using a resampling
technique, known as the “stationary bootstrap”. The declaration requires
a sample_size
and a list of quantiles
to estimate. For example:
sampling_uncertainty:
sample_size: 1000
quantiles: [0.05,0.95]
Care should be taken in choosing the sample_size
because each
additional sample requires that the pairs are resampled for every pool
and the statistics recalculated each time, which is computationally
expensive.
See: Sampling uncertainty assessment for more details.
The statistics output formats are declared by listing them. For example:
output_formats:
- csv2
- pairs
- png
When no output_formats
are declared, the software will write the
csv2
format, by default. For example, when considering the simplest
possible evaluation described in What is the simplest possible
evaluation I can
declare?,
no output_formats
are declared and csv2
will be written.
The supported statistics formats include:
-
png
: Portable Network Graphics (PNG); -
svg
: Scalable Vector Graphics (SVG); -
csv2
: Comma separated values with a single file per evaluation (see Output Format Description for CSV2 for more information); -
netcdf2
: Network Common Data Form (NetCDF); -
protobuf
: Protocol buffers. An efficient binary format that produces one file per evaluation.
The following statistics formats are supported (for now), but are deprecated for removal and should be avoided:
-
csv
: comma separated values; and -
netcdf
: comma separated values.
In addition, to help with tracing statistics to the paired values that produced them, the following is supported:
-
pairs
: Comma separated values of the paired time-series data from which statistics were produced (which are gzipped, by default).
Some of these formats support additional parameters, as follows:
Parameter | Applicable formats | Purpose | Example in context |
---|---|---|---|
width |
All graphics formats (e.g., png ). |
An integer value (greater than 0) that prescribes the width of the graphics to produce. | metrics: |
height |
All graphics formats (e.g., png ). |
An integer value (greater than 0) that prescribes the height of the graphics to produce. | metrics: |
Yes, there several additional other options for filtering or transforming data or otherwise refining the evaluation. These are listed below:
Option | Purpose | Example in context |
---|---|---|
pair_frequency |
By default, all paired values are included. However, this option allows for paired values to be included only at a prescribed frequency, such as every 12 hours. | observed: some_observations.csv |
cross_pair |
When calculating skill scores, all paired values are used by default. This can be misleading when the (observed , predicted ) pairs contain many more or fewer pairs than the (observed , baseline ) pairs. In order to mitigate this, cross pairing is supported. When using cross-pairing, only those pairs whose valid times appear in both sets of pairs will be included. In addition, the treatment of forecast reference times is prescribed by an option. The available options are:- exact : Only admit those pairs whose forecast reference times appear in both sets of pairs; and- fuzzy : Choose the nearest forecast reference times in both sets of pairs and discard any others.In all cases, the resulting skill score statistics will always use the same number of ( observed , predicted ) pairs and (observed , baseline ) pairs. In addition, when using exact cross-pairing, the valid times and reference times are both guaranteed to match exactly. |
observed: some_observations.csv |
minimum_sample_size |
An integer greater than zero that identifies the minimum sample size for which a statistic will be included. For continuous measures, this is the number of pairs. For dichotomous measures, it is the smaller of the number of occurrences and non-occurrences of the dichotomous event. If a statistic was computed from a smaller sample size than the minimum_sample_size , it will be discarded. |
observed: some_observations.csv |
decimal_format |
The decimal format to use when writing statistics to numeric formats. It also controls the format of tick labels for time-based domain axes in generated graphics. | observed: some_observations.csv |
duration_format |
The duration format to use when writing statistics to numeric formats. It also controls the units of time-based domain axes in generated graphics. The supported values include: - seconds - minutes - hours - days
|
observed: some_observations.csv |
Yes, examples of complete declarations can be found in a separate wiki, Complete Examples of Evaluation Declarations TODO.
Yes, the declaration language uses a schema, which defines the superset of declarations that the WRES could accept. The schema uses the JSON schema language:
The latest version of the schema is available in the code repository:
https://github.com/NOAA-OWP/wres/blob/master/wres-config/nonsrc/schema.yml
However, the schema is relatively permissive. In other words, there are some evaluations that are permitted by the schema that are not permitted by the WRES software itself. Indeed, a schema is best suited for simple validation. More comprehensive validation is performed by the software itself, once the declaration has been validated against the schema.
In practice, you may notice this when reading feedback from the software about validation failures. The earliest failures will occur when the declaration is inconsistent with the schema. The feedback that results from these failures will tend to be more abstract or less human readable because it will list a cascade of failures. In other cases, the failure will be straightforward. You should generally look for the simplest/most understandable among them. For example, a declaration like this:
observed: some_observations.csv
predicted: some_forecasts.csv
foo: bar.csv
Will produce an error like this, because the foo
key is not part of
the schema and the schema does not permit additional properties:
wres.config.yaml.DeclarationException: When comparing the declared evaluation to the schema, encountered 1 errors, which must be fixed. Hint: some of these errors may have the same origin, so look for the most precise/informative error(s) among them. The errors are:
- $.foo: is not defined in the schema and the schema does not allow additional properties
You will sometimes encounter warnings or errors that relate to your
declaration. For example, if an error is wrapped in a
DeclarationException
, the problem will originate from your
declaration. These errors arise because the declaration is invalid for
some reason. There are three main reasons why a declaration could be
invalid:
- The declaration is not a valid YAML document. You can test whether your declaration is a valid YAML document using an online tool, such as: https://www.yamllint.com/
- The declaration contains options that are not understood or allowed by WRES (specifically, they are not consistent with the declaration schema, as described in Does the declaration language use a schema?). For example, if you include options that are misspelled or options that fall outside valid bounds, such as probabilities that fall outside [0,1], you can expect an error; or
- The declaration contains options that are disallowed by WRES in combination with other options. For example, if you add an ensemble-like metric and declare that none of the data types are ensemble-like, then you can expect an error.
In general, any warning or error messages should be straightforward and intuitive, indicating what you should do to fix them (or, in the case of warnings, what you should consider about the options you chose). Furthermore, if there are multiple warnings or errors, they should all be listed at once. For example, consider the following invalid declaration:
observed: some_observations.csv
predicted: some_predictions.csv
lead_time_pools:
period: 13
unit: hours
metrics:
- probability of detection
This declaration produces the following errors:
wres.config.yaml.DeclarationException: Encountered 2 error(s) in the declared evaluation, which must be fixed:
- The declaration included 'lead_time_pools', which requires the 'lead_times' to be fully declared. Please remove the 'lead_time_pools' or fully declare the 'lead_times' and try again.
- The declaration includes metrics that require either 'thresholds' or 'probability_thresholds' but none were found. Please remove the following metrics or add the required thresholds and try again: [PROBABILITY OF DETECTION].
If the errors are not intuitive, you should create a ticket asking for more clarity and we will explain the failure and improve the error message. However, errors that fall within the first two categories are delegated to other tools and are not, therefore, fully within our control. For example, when your declaration fails validation against the schema, you may be presented with a cascade of errors that are not immediately intuitive. For example, consider the following, invalid declaration:
observed: some_observations.csv
predicted: some_predictions.csv
metrics:
- some metric
Since some metric
is not an expected metric, this declaration will
produce an error. However, the evaluation actually produces a cascade of
errors, which occur because the metrics
declaration is invalid against
any known (sub)schema within the overall schema:
wres.config.yaml.DeclarationException: When comparing the declared evaluation to the schema, encountered 5 errors, which must be fixed. Hint: some of these errors may have the same origin, so look for the most precise/informative error(s) among them. The errors are:
- $.metrics[0]: does not have a value in the enumeration [box plot of errors by observed value, box plot of errors by forecast value, brier score, brier skill score, contingency table, continuous ranked probability score, continuous ranked probability skill score, ensemble quantile quantile diagram, maximum, mean, minimum, rank histogram, relative operating characteristic diagram, relative operating characteristic score, reliability diagram, sample size, standard deviation]
- $.metrics[0]: does not have a value in the enumeration [bias fraction, box plot of errors, box plot of percentage errors, coefficient of determination, pearson correlation coefficient, index of agreement, kling gupta efficiency, mean absolute error, mean error, mean square error, mean square error skill score, mean square error skill score normalized, median error, quantile quantile diagram, root mean square error, root mean square error normalized, sample size, sum of square error, volumetric efficiency, mean absolute error skill score]
- $.metrics[0]: does not have a value in the enumeration [contingency table, threat score, equitable threat score, frequency bias, probability of detection, probability of false detection, false alarm ratio, peirce skill score]
- $.metrics[0]: string found, object expected
- $.metrics[0]: does not have a value in the enumeration [time to peak relative error, time to peak error]
This cascade of errors is somewhat unintuitive but, at the time of
writing, it cannot be improved easily. As suggested in the Hint
, you
should look for the most precise and informative error among the
cascade. In this case, it should be reasonably clear that the metric in
position “[0]” (meaning the first metric) is not a name that occurs
within any known enumeration. As the schema includes several metric
groups, each with a separate enumeration, this error is reported with
respect to each group.
The WRES Wiki
-
Options for Deploying and Operating the WRES
- Obtaining and using the WRES as a standalone application
- WRES Local Server
- WRES Web Service (under construction)
-
- Format Requirements for CSV Files
- Format Requirements for NetCDF Files
- Introductory Resources on Forecast Verification
- Instructions for Human Interaction with a WRES Web-service
- Instructions for Programmatic Interaction with a WRES Web-service
- Output Format Description for CSV2
- Posting timeseries data directly to a WRES web‐service as inputs for a WRES job
- WRES Scripts Usage Guide