@@ -73,8 +73,8 @@ the following:
7373Recipe section: ``datasets ``
7474============================
7575
76- The ``datasets `` section includes dictionaries that, via key-value pairs, define standardized
77- data specifications:
76+ The ``datasets `` section includes dictionaries that, via key-value pairs or
77+ "facets", define standardized data specifications:
7878
7979- dataset name (key ``dataset ``, value e.g. ``MPI-ESM-LR `` or ``UKESM1-0-LL ``).
8080- project (key ``project ``, value ``CMIP5 `` or ``CMIP6 `` for CMIP data,
@@ -114,6 +114,162 @@ For example, a datasets section could be:
114114 - {dataset: HadGEM3-GC31-MM, project: CMIP6, exp: dcppA-hindcast, ensemble: r1i1p1f1, sub_experiment: s2000, grid: gn, start_year: 2000, end_year, 2002}
115115 - {dataset: BCC-CSM2-MR, project: CMIP6, exp: dcppA-hindcast, ensemble: r1i1p1f1, sub_experiment: s2000, grid: gn, timerange: '*'}
116116
117+ .. _dataset_wildcards :
118+
119+ Automatically populating a recipe with all available datasets
120+ -------------------------------------------------------------
121+
122+ It is possible to use :obj: `glob ` patterns or wildcards for certain facet
123+ values, to make it easy to find all available datasets locally and/or on ESGF.
124+ Note that ``project `` cannot be a wildcard.
125+
126+ The facet values for local files are retrieved from the directory tree where the
127+ directories represent the facets values.
128+ Reading facet values from file names is not yet supported.
129+ See :ref: `CMOR-DRS ` for more information on this kind of file organization.
130+
131+ When (some) files are available locally, the tool will not automatically look
132+ for more files on ESGF. To populate a recipe with all available datasets from
133+ ESGF, ``offline `` should be set to ``false `` and ``always_search_esgf `` should
134+ be set to ``true `` in the
135+ :ref: `user configuration file<user configuration file> `.
136+
137+ For more control over which datasets are selected, it is recommended to use
138+ a Python script or `Jupyter notebook <https://jupyter.org/ >`_ to compose
139+ the recipe.
140+ See :ref: `/notebooks/composing-recipes.ipynb ` for an example.
141+ This is particularly useful when specific relations are required between
142+ datasets, e.g. when a dataset needs to be available for multiple variables
143+ or experiments.
144+
145+ An example recipe that will use all CMIP6 datasets and all ensemble members
146+ which have a ``'historical' `` experiment could look like this:
147+
148+ .. code-block :: yaml
149+
150+ datasets :
151+ - project : CMIP6
152+ exp : historical
153+ dataset : ' *'
154+ institute : ' *'
155+ ensemble : ' *'
156+ grid : ' *'
157+
158+ After running the recipe, a copy specifying exactly which datasets were used
159+ is available in the output directory in the ``run `` subdirectory.
160+ The filename of this recipe will end with ``_filled.yml ``.
161+
162+ For the ``timerange `` facet, special syntax is available.
163+ See :ref: `timerange_examples ` for more information.
164+
165+ If populating a recipe using wildcards does not work, this is because there
166+ were either no files found that match those facets, or the facets could not be
167+ read from the directory name or ESGF.
168+
169+ .. _supplementary_variables :
170+
171+ Defining supplementary variables (ancillary variables and cell measures)
172+ ------------------------------------------------------------------------
173+
174+ It is common practice to store ancillary variables (e.g. land/sea/ice masks)
175+ and cell measures (e.g. cell area, cell volume) in separate datasets that are
176+ described by slightly different facets.
177+ In ESMValCore, we call ancillary variables and cell measures "supplementary
178+ variables".
179+ Some :ref: `preprocessor functions <Preprocessors >` need this information to
180+ work.
181+ For example, the :ref: `area_statistics<area_statistics> ` preprocessor function
182+ needs to know area of each grid cell in order to compute a correctly weighted
183+ statistic.
184+
185+ To attach these variables to a dataset, the ``supplementary_variables `` keyword
186+ can be used.
187+ For example, to add cell area to a dataset, it can be specified as follows:
188+
189+ .. code-block :: yaml
190+
191+ datasets :
192+ - dataset : BCC-ESM1
193+ project : CMIP6
194+ exp : historical
195+ ensemble : r1i1p1f1
196+ grid : gn
197+ supplementary_variables :
198+ - short_name : areacella
199+ mip : fx
200+ exp : 1pctCO2
201+
202+ Note that the supplementary variable will inherit the facet values from the main
203+ dataset, so only those facet values that differ need to be specified.
204+
205+ .. _supplementary_dataset_wildcards :
206+
207+ Automatically selecting the supplementary dataset
208+ -------------------------------------------------
209+
210+ When using many datasets, it may be quite a bit of work to find out which facet
211+ values are required to find the corresponding supplementary data.
212+ The tool can automatically guess the best matching supplementary dataset.
213+ To use this feature, the supplementary dataset can be specified as:
214+
215+ .. code-block :: yaml
216+
217+ datasets :
218+ - dataset : BCC-ESM1
219+ project : CMIP6
220+ exp : historical
221+ ensemble : r1i1p1f1
222+ grid : gn
223+ supplementary_variables :
224+ - short_name : areacella
225+ mip : fx
226+ exp : ' *'
227+ activity : ' *'
228+ ensemble : ' *'
229+
230+ With this syntax, the tool will search all available values of ``exp ``,
231+ ``activity ``, and ``ensemble `` and use the supplementary dataset that shares the
232+ most facet values with the main dataset.
233+ Note that this behaviour is different from
234+ :ref: `using wildcards in the main dataset <dataset_wildcards >`,
235+ where they will be expanded to generate all matching datasets.
236+ The available datasets are shown in the debug log messages when running a recipe
237+ with wildcards, so if a different supplementary dataset is preferred, these
238+ messages can be used to see what facet values are available.
239+ The facet values for local files are retrieved from the directory tree where the
240+ directories represent the facets values.
241+ Reading facet values from file names is not yet supported.
242+ If wildcard expansion fails, this is because there were either no files found
243+ that match those facets, or the facets could not be read from the directory
244+ name or ESGF.
245+
246+ Automatic definition of supplementary variables
247+ -----------------------------------------------
248+
249+ If an ancillary variable or cell measure is
250+ :ref: `needed by a preprocessor function <preprocessors_using_supplementary_variables >`,
251+ but it is not specified in the recipe, the tool will automatically make a best
252+ guess using the syntax above.
253+ Usually this will work fine, but if it does not, it is recommended to explicitly
254+ define the supplementary variables in the recipe.
255+
256+ To disable this automatic addition, define the supplementary variable as usual,
257+ but add the special facet ``skip `` with value ``true ``.
258+ See :ref: `preprocessors_using_supplementary_variables ` for an example recipe.
259+
260+ Saving ancillary variables and cell measures
261+ --------------------------------------------
262+
263+ By default, ancillary variables and cell measures will be removed
264+ from the main variable before saving it to file because they can be as big as
265+ the main variable.
266+ To keep the supplementary variables, disable the preprocessor function that
267+ removes them by setting ``remove_supplementary_variables: false `` in the
268+ preprocessor profile in the recipe.
269+
270+ Concatenating data corresponding to multiple facets
271+ ---------------------------------------------------
272+
117273It is possible to define the experiment as a list to concatenate two experiments.
118274Here it is an example concatenating the `historical ` experiment with `rcp85 `
119275
@@ -130,6 +286,9 @@ In this case, the specified datasets are concatenated into a single cube:
130286 datasets :
131287 - {dataset: CanESM2, project: CMIP5, exp: [historical, rcp85], ensemble: [r1i1p1, r1i2p1], start_year: 2001, end_year: 2004}
132288
289+ Short notation of ensemble members and sub-experiments
290+ ------------------------------------------------------
291+
133292ESMValTool also supports a simplified syntax to add multiple ensemble members from the same dataset.
134293In the ensemble key, any element in the form `(x:y) ` will be replaced with all numbers from x to y (both inclusive),
135294adding a dataset entry for each replacement. For example, to add ensemble members r1i1p1 to r10i1p1
@@ -152,7 +311,7 @@ Please, bear in mind that this syntax can only be used in the ensemble tag.
152311Also, note that the combination of multiple experiments and ensembles, like
153312exp: [historical, rcp85], ensemble: [r1i1p1, "r(2:3)i1p1"] is not supported and will raise an error.
154313
155- The same simplified syntax can be used to add multiple sub-experiment ids :
314+ The same simplified syntax can be used to add multiple sub-experiments :
156315
157316.. code-block :: yaml
158317
@@ -161,6 +320,9 @@ The same simplified syntax can be used to add multiple sub-experiment ids:
161320
162321 .. _timerange_examples :
163322
323+ Time ranges
324+ -----------
325+
164326When using the ``timerange `` tag to specify the start and end points, possible values can be as follows:
165327
166328
@@ -278,17 +440,15 @@ section will include:
278440- a description of the diagnostic and lists of themes and realms that it applies to;
279441- an optional ``additional_datasets `` section.
280442- an optional ``title `` and ``description ``, used to generate the title and description
281- of the ``index.html `` output file.
443+ in the ``index.html `` output file.
282444
283445.. _tasks :
284446
285447The diagnostics section defines tasks
286448-------------------------------------
287449The diagnostic section(s) define the tasks that will be executed when running the recipe.
288450For each variable a preprocessing task will be defined and for each diagnostic script a
289- diagnostic task will be defined. If variables need to be derived
290- from other variables, a preprocessing task for each of the variables
291- needed to derive that variable will be defined as well. These tasks can be viewed
451+ diagnostic task will be defined. These tasks can be viewed
292452in the main_log_debug.txt file that is produced every run. Each task has a unique
293453name that defines the subdirectory where the results of that task are stored. Task
294454names start with the name of the diagnostic section followed by a '/' and then
0 commit comments