Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/c3s extract funs #110

Merged
merged 28 commits into from
Feb 1, 2022
Merged

Feature/c3s extract funs #110

merged 28 commits into from
Feb 1, 2022

Conversation

bzah
Copy link
Member

@bzah bzah commented Jan 14, 2022

Pull Request for the issue "C3S integration" (not logged in github)

  • These changes were tested with real data and unit tests cover them as well.
  • The relevant documentation has been added (or updated).
  • A short description of the changes has been added to doc/source/references/release_notes.rst

Describe the changes you made

This PR covers multiple issues that c3s integration highlighted:

  • Refactored ecad_functions (removed duplicated code, simplified function signatures...)
  • Refactored IndexConfig to hide some technical knowledge which was leaked to other modules.
  • Made a basic integration of clix-meta yaml to populate the generated docstring for c3s.
    This makes pyyaml an required dependency of icclim.
  • Fixed an issue with aliasing of "icclim" module and "icclim" package
  • Added some metadata to qualify the cead_indices and recognize the arguments necessary to compute them.
Example of generated code with index TG (click me!)
def tg(
    in_files: typing.Union[str, typing.List[str], xarray.Dataset],
    var_name: typing.Union[str, typing.List[str], None] = None,
    slice_mode: typing.Union[typing.Any, str, typing.List[typing.Union[str, typing.Tuple, int]]] = Frequency.YEAR,
    time_range: typing.List[datetime.datetime] = None,
    out_file: str = "icclim_out.nc",
    transfer_limit_Mbytes: float = None,
    ignore_Feb29th: bool = False,
    netcdf_version: typing.Union[str, icclim.models.netcdf_version.NetcdfVersion] = NetcdfVersion.NETCDF4,
    logs_verbosity: typing.Union[icclim.icclim_logger.Verbosity, str] = Verbosity.LOW) -> xarray.Dataset:
    """
    Mean of daily mean temperature
    Source: Source: ECA&D, Algorithm Theoretical Basis Document (ATBD) v11

    Parameters
    ----------
    in_files : Union[str, List[str], Dataset]
        Absolute path(s) to NetCDF dataset(s) (including OPeNDAP URLs),
        or xarray.Dataset.
    var_name : str
        ``optional`` Target variable name to process corresponding to ``in_files``.
        If None (default) on ECA&D index, the variable is guessed based on the climate
        index wanted.
        Mandatory for a user index.
    slice_mode : str
        Type of temporal aggregation:
        {"year", "month", "DJF", "MAM", "JJA", "SON", "ONDJFM" or "AMJJAS"}.
        Default is "year".
        See :ref:`slice_mode` for details.
    time_range : List[datetime.datetime]
        ``optional`` Temporal range: upper and lower bounds for temporal subsetting.
        If ``None``, whole period of input files will be processed.
        Default is ``None``.
    out_file : str
        Output NetCDF file name (default: "icclim_out.nc" in the current directory).
        Default is "icclim_out.nc".
        If the input ``in_files`` is a ``Dataset``, ``out_file`` field is ignored.
        Use the function returned value instead to retrieve the computed value.
        If ``out_file`` already exists, icclim will overwrite it!
    transfer_limit_Mbytes : float
        ``optional`` Maximum Dask chunk size in memory.
        The value should be around 200 MB.
        If empty, no chunking is performed, the whole dataset will be in memory and the
        performance might be poor.
    ignore_Feb29th : bool
        ``optional`` Ignoring or not February 29th (default: False).
    netcdf_version : icclim.models.netcdf_version.NetcdfVersion
        ``optional`` NetCDF version to create (default: "NETCDF3_CLASSIC").
    logs_verbosity : Union[str, Verbosity]
        ``optional`` Configure how verbose icclim is.
        Possible values: ``{"LOW", "HIGH", "SILENT"}`` (default: "LOW")

    Notes:
    ------
    This function has been auto-generated.
    """
    return icclim.index(
        index_name="TG",
        in_files=in_files,
        var_name=var_name,
        slice_mode=slice_mode,
        time_range=time_range,
        out_file=out_file,
        transfer_limit_Mbytes=transfer_limit_Mbytes,
        ignore_Feb29th=ignore_Feb29th,
        netcdf_version=netcdf_version,
        logs_verbosity=logs_verbosity)

Work in progress:

  • unit tests for clix-meta module
  • in extractor, replace the concatenation with a template (jinja or equivalent)
  • we must consider integrating the generated code directly in icclim, it can certainly be useful to some users.
    Also, for user who would try icclim first in c3s toolbox, it would exposes a similar API to them.
  • We should consider extracting user_index as well
  • Add documentation links for clix-meta module
  • Add tutorial to extract icclim functions

- Removed duplicated code.
- Made use of qualified names for cf_cariables.
This hides in IndexConfig the expected order of input variable.
- Simplified output type of funs.
No need to create additional types.
It was messing with fully qualified named making python being lost,
and it couldn't choose correctly between the package and the module.
For now, we store the yaml file internally.
This is based on version 0.3.0 of the yaml file.
We must follow updates from https://github.com/clix-meta/clix-meta
This qualifier metadata is used internally to figure out
the necessary parameters to compute the index. We need
it for C3S auto-generated functions.
This module will be used by C3S to
extract icclim indices into individual
functions.
@pagecp
Copy link
Collaborator

pagecp commented Jan 14, 2022

Very impressive changes!

  • We should consider extracting user_index as well: this will be very useful and will simplify the code structure to be even more understandable and modular I guess.

@bzah

This comment has been minimized.

These changes are related to c3s integration.
`transfer_limit_Mbytes` was used to control dask chunking.
This is however not really a good practice because all other
dask configurations are still left outside icclim.
For project related reason this is not a
5.1.0 even if there are many changes since
the initial 5.0 bump.
This should not be an issue as icclim-v5 is
not widely distributed yet.
- Overlapping years were not properly computed.
- Chunking was done at dataset level but, it is only needed at data-array level.
This function list all available ecad indices.
This could be improved by destructuring
`user_index` parameter into multiple
parameters on the generated function.
All changes are now under 5.0.0rc3, which
is the targeted official release.
When the in_base and out_of_base
fully overlap, it is unnecessary
(and costly) to bootstrap percentiles.
@bzah bzah merged commit 03d7b83 into master Feb 1, 2022
@bzah bzah deleted the feature/c3s_extract_funs branch February 1, 2022 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants