-
Notifications
You must be signed in to change notification settings - Fork 44
Support wildcards in the recipe and improve support for ancillary variables and dataset versioning #1609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Use configuration from experimental Output file is now a Path fx file facets are now correctly set when parsing the recipe, no need to do it in data finder anymore
|
It would be interesting to discuss in a wider circle including the @ESMValGroup/scientific-lead-development-team (maybe at a workshop?) where we want to go with these 2 new recipe formats to have a common understanding of which formats should be used for recipes in the main branch. Once this PR is merged (and v2.8 released), two new recipe formats will be allowed:
I understand these two new formats would be extremely useful to compose recipes and to ease reproducibility by recording dataset versions as stated in the docs with this PR. But I wonder about the readability of the new format, particularly the second one. For example, |
valeriupredoi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many a test was done by me too, and I find this monster to be a good monster, like Godzilla - mega well done @bouweandela and all the other peeps who tested and suggested 🍻 x 10
|
Thanks to everyone who helped with testing and reviewing this! 🎉 |
Description
Add a dataset class that simplifies dealing with datasets as defined in the recipe.
Goals of this pull request are to:
rundirectory, e.g.run/recipe_example_filled.ymlwhen running a recipe calledrecipe_example.yml.esmvalcore.dataset.Datasetclass added in Addesmvalcore.datasetmodule #1877Example recipe demonstrating the new wildcard features
this will expand to all CMIP6 datasets and attach the
areacellavariable that is most similar to the main variable.Documentation
Temporary
use_legacy_supplementariescommand-line/config-user optionWhen the
fx_variableskeyword is used in a preprocessor function, the tool will automatically run with--use_legacy_supplementaries=Trueand use these definitions to define the supplementary datasets in the recipe. To try out the new behaviour, run the tool with--use_legacy_supplementaries=False.--use_legacy_supplementaries=Falseis the default when nofx_variablesare defined in the recipe. This option will be removed inv2.10.0. To upgrade your recipes, remove all entriesfx_variablesand if necessary, define the ancillaries as described inBackward incompatible changes
There is a new and much smarter system for automatically defining the supplementary variables (ancillary variables and cell measures) needed by preprocessor functions in the recipe. In v2.8 it will be used by default when there are no
fx_variablesdefined as preprocessor function arguments in the recipe, in v2.9 it will be used by default in all cases, and in v2.10 it will be the only way. This new system automatically defines supplementary variables using the new wildcards feature, so it will find many more supplementary variables. It will also add only a single supplementary variable, e.g. when a preprocessor function requires cell area, it will addareacelloif the main variable is frommodeling_realmocean,seaIce, orocnBgchem, andareacellain other cases. Previously, both variables were added and downloaded, but then only the one with the matching shape was used. This had the downside that unnecessary data was downloaded and errors in the supplementary variable were silently ignored.In some cases, the new way of adding supplementary variables may not work (yet), specifically when the facet values cannot be read from the path to the file or from ESGF. In that case, the supplementary variables can be specified in the recipe. Here are some examples of where things may not work and how to solve them. If you prefer to postpone upgrading to v2.10, there is also the option to run the tool with
--use-legacy-supplementaries=True, which will use the previous behaviour.Example with OBS data
When using preprocessor function
weighting_landsea_fractionwith variablecSoilfromLmonand a dataset from theOBSorOBS6projects with the defaultdrssetting. This is expected to start working only in v2.9 (see #1609 (comment) for details).the tool will automatically try to use
but that does not work yet because the tool cannot yet read the version from the filename, only from the directory name and it is not used in the directory name with the default
drs#1943. Therefore, the user needs to specify the supplementary variable like this:Example with a variable that is on an unusual grid
When using the preprocessor function
area_statisticswith variablefgco2fromOmonand no supplementary variables defined in the recipe. The tool will automatically add theareacellovariable becausefgco2is from an ocean realm, so it will replacewith
but that does not work because unusually, the
fgco2variable ofMIROC-ESMis not on an ocean grid. To make the recipe work, the user needs to specify the supplementary variable in the recipe for this dataset and tell the tool to not addareacelloautomatically using theskipflag:see Specifying supplementary variables in the recipe and Preprocessor functions that use ancillary variables and cell measures for more information.
Deprecations
fx_variablesfor various preprocessor functions that use ancillary variables or cell measures is deprecated and will be removed in v2.10.0. The recommended upgrade procedure is to remove thefx_variablessection. If automatically defining the required supplementary variables does not work, define them in the variable or (additional_)datasets section as described in the documentation.use_legacy_supplementarieson the command line/config-user.yml keeps support for the old ways of specifying ancillary variables and cell measures in the recipe, i.e. automatically adding some and customizing those with the preprocessor argumentfx_variables. It is currently enabled only iffx_variablesis used in the recipe, but will be disabled by default in v2.9 and is scheduled for removal in v2.10.add_fx_variablesandremove_fx_variablesare deprecated and will be removed in v2.10. Use the new preprocessor functionsadd_supplementary_variablesandremove_supplementary_variablesinstead.callbackargument of the functionesmvalcore.preprocessor.loadis deprecated and will be removed in v2.10 Proposal to deprecate thecallbackargument fromesmvalcore.preprocessor.loadin v2.8 and remove it in v2.10. #1800Please comment if you think this schedule is too fast.
Fixed issues
Closes #56
Closes #377
Closes #589
Closes #1138
Closes #1185
Closes #1297
Closes #1454
Closes #1896
Potential follow ups: #1760, #1891
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: