Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimensionless units as "1" - adapt to numpy 2 #1814

Merged
merged 40 commits into from
Jul 31, 2024
Merged

Dimensionless units as "1" - adapt to numpy 2 #1814

merged 40 commits into from
Jul 31, 2024

Conversation

aulemahal
Copy link
Collaborator

@aulemahal aulemahal commented Jun 27, 2024

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGES.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

  • Changes NaN and NAN to nan, Inf to inf.
  • Changes a test so the new data type promotion of numpy 2 fits our expected values
  • Relaxes a test for the same reason
  • Change expected unit order in some cases (new cf_xarray + pint)
  • Dimensionless units are now printed as "1".
  • Simplify pint2cfunits.
  • Add ensure_absolute_tempetature to its module's __all__ and move ensure_delta up in the same module so both functions are near another in the file.

Does this PR introduce a breaking change?

Yes it does. Fixing numpy 2 issues made me fix pint 0.24.1 issues that made me fix cf_xarray issues which have solution that is not backwards-compatible and now pint and cf_xarray have updated pinned that imply a numpy >=2 pin.

Other information:

We will require 3 new pins :

UPDATE: No pins were added, but the behaviour of xclim will be different for dimensionless indicators depending on the cf_xarray/pint versions installed.

@github-actions github-actions bot added sdba Issues concerning the sdba submodule. indicators Climate indices and indicators labels Jun 27, 2024
@aulemahal
Copy link
Collaborator Author

Sadly, there is one last issue that I did not see how to fix. @coxipi the following tests fail with numpy 2:

FAILED tests/test_indices.py::TestStandardizedIndices::test_standardized_precipitation_index[D-1-gamma-ML-values8-0.02] - AssertionError: 
FAILED tests/test_indices.py::TestStandardizedIndices::test_standardized_precipitation_index[D-12-gamma-ML-values9-0.02] - AssertionError: 
FAILED tests/test_indices.py::TestStandardizedIndices::test_standardized_precipitation_index[D-1-fisk-ML-values10-0.02] - AssertionError: 

I'm guessing it's also a data type promotion issue, but I don't know where to look.

@coxipi
Copy link
Contributor

coxipi commented Jun 29, 2024

The SPI functions were only the canary in the mine... indices assigning "" (dimensionless units) are concerned, you can also test "test_ffdi.py" to see this. It comes from pint2cfunits, here is a minimal example with the problematic part:

import xclim 
value = xclim.core.units.units2pint("")
f"{value:cf}"

I guess this should be reported upstream? I'm not sure what body is in charge of operations like f"{value:cf}". Here is a dumdum change I tested in our codebase and checked it works, just change the return of pint2cfunits

return "" if value == units2pint("") else f"{value:cf}"

@aulemahal
Copy link
Collaborator Author

This is fixed when you install my branch from cf-xarray. If you do so (with pint 0.24.1 and xarray's master), you'll see that only SPI errors are left.

@Zeitsperre
Copy link
Collaborator

If you do so (with pint 0.24.1 and xarray's master), you'll see that only SPI errors are left.

How should we handle the new pint version? Will it support Python3.9+? Can we safely raise the lower pin to that upcoming version?

@aulemahal
Copy link
Collaborator Author

I guess I was eager to begin my weekend when I wrote this PR's top comment, I could have given more info ;).

My current PR to cf_xarray will fix the issue for pint 0.24.1. An older pint version can be used with any cf-xarray, and next cf-xarray will be usable with any pint version. Unless we want to write a somewhat complex dependency string, I suggest we pin cf-xarray only, not pint.

Note: I will do further changes to the cf-xarray PR, so the change will be a bit more breaking than initially thought, which goes in the direction of my suggestion.

Pint 0.24.1 supports python 3.9+. When all dependent PRs and releases are done, I'll test xclim with different versions of pint to be sure, before merging this PR.

@aulemahal
Copy link
Collaborator Author

aulemahal commented Jul 8, 2024

Oups. The units issue was fixed in cf-xarray, but with a change that is not retrocompatible. The newest cf-xarray will return "1" as the dimensionless unit with pint 0.24.1 (and only with this version). This is what the CF conventions specify, but previous pint made this difficult, so we used to return an empty string.

Changing xclim to return "1" thus requires cf-xarray 0.9.3 and pint >= 0.24.1, but the latter implies numpy >=21.
So. This PR that was meant to fix issues with numpy 2 is now pinning numpy to >= 2.

Footnotes

  1. To be clear xclim doesn't actually needs numpy >= 2, but I bumped the pin in the env to make solving envs easier for conda/mamba since it is implied by the pint's pin.

@aulemahal
Copy link
Collaborator Author

@coxipi FYI, these are the SPI errors on my side:

_____________________________________________ TestStandardizedIndices.test_standardized_precipitation_index[D-1-gamma-ML-values8-0.02] _____________________________________________
[gw1] linux -- Python 3.12.4 /home/pbourg/miniforge3/envs/xclim/bin/python3.12

self = <test_indices.TestStandardizedIndices object at 0x7f3c5fbf74a0>, open_dataset = <function open_dataset.<locals>._open_session_scoped_file at 0x7f3c5dd1c720>, freq = 'D'
window = 1, dist = 'gamma', method = 'ML', values = [-0.011698, 1.597031, 0.969714, 0.265561, -0.132654], diff_tol = 0.02

    @pytest.mark.slow
    @pytest.mark.parametrize(
        "freq, window, dist, method,  values, diff_tol",
        [
            (
                "MS",
                1,
                "gamma",
                "APP",
                [1.31664, 1.45069, 1.94609, -3.09, 0.850681],
                2e-2,
            ),
            (
                "MS",
                12,
                "gamma",
                "APP",
                [0.598209, 1.55976, 1.69309, 0.9964, 0.7028],
                2e-2,
            ),
            (
                "MS",
                1,
                "gamma",
                "ML",
                [1.460105, 1.602951, 2.072521, -3.09, 0.891468],
                0.04,
            ),
            ("MS", 12, "gamma", "ML", [0.59821, 1.5598, 1.6931, 0.9964, 0.7028], 0.04),
            (
                "MS",
                1,
                "fisk",
                "ML",
                [1.41236, 1.51192, 1.93324, -2.74089, 0.932674],
                2e-2,
            ),
            (
                "MS",
                12,
                "fisk",
                "ML",
                [0.683273, 1.51189, 1.61597, 1.03875, 0.72531],
                2e-2,
            ),
            (
                "D",
                1,
                "gamma",
                "APP",
                [-0.18618353, 1.44582971, 0.95985043, 0.15779587, -0.37801587],
                2e-2,
            ),
            (
                "D",
                12,
                "gamma",
                "APP",
                [-0.24417774, -0.11404418, 0.64997039, 1.07670517, 0.6462852],
                2e-2,
            ),
            (
                "D",
                1,
                "gamma",
                "ML",
                [-0.011698, 1.597031, 0.969714, 0.265561, -0.132654],
                2e-2,
            ),
            (
                "D",
                12,
                "gamma",
                "ML",
                [-0.158116, -0.049222, 0.672544, 1.08332, 0.660903],
                2e-2,
            ),
            (
                "D",
                1,
                "fisk",
                "ML",
                [-0.158949, 1.308225, 0.449846, 0.146699, -0.502737],
                2e-2,
            ),
            (
                "D",
                12,
                "fisk",
                "ML",
                [-0.14151269, -0.01914608, 0.7080277, 1.01510279, 0.6954002],
                2e-2,
            ),
            (
                "D",
                1,
                "fisk",
                "APP",
                [-0.417288, 1.275686, 1.002566, 0.206854, -0.672117],
                2e-2,
            ),
            (
                "D",
                12,
                "fisk",
                "APP",
                [-0.220136, -0.065782, 0.783319, 1.134428, 0.782857],
                2e-2,
            ),
            (
                None,
                1,
                "gamma",
                "APP",
                [-0.18618353, 1.44582971, 0.95985043, 0.15779587, -0.37801587],
                2e-2,
            ),
            (
                None,
                12,
                "gamma",
                "APP",
                [-0.24417774, -0.11404418, 0.64997039, 1.07670517, 0.6462852],
                2e-2,
            ),
        ],
    )
    def test_standardized_precipitation_index(
        self, open_dataset, freq, window, dist, method, values, diff_tol
    ):
        ds = open_dataset("sdba/CanESM2_1950-2100.nc").isel(location=1)
        if freq == "D":
            ds = ds.convert_calendar("366_day", missing=np.nan)
            # to compare with ``climate_indices``
        pr = ds.pr.sel(time=slice("1998", "2000"))
        pr_cal = ds.pr.sel(time=slice("1950", "1980"))
        fitkwargs = {}
        if method == "APP":
            fitkwargs["floc"] = 0
        params = xci.stats.standardized_index_fit_params(
            pr_cal,
            freq=freq,
            window=window,
            dist=dist,
            method=method,
            fitkwargs=fitkwargs,
            zero_inflated=True,
        )
        spi = xci.standardized_precipitation_index(pr, params=params)
        # Only a few moments before year 2000 are tested
        spi = spi.isel(time=slice(-11, -1, 2))
    
        # [Guttman, 1999]: The number of precipitation events (over a month/season or
        # other time period) is generally less than 100 in the US. This suggests that
        # bounds of ± 3.09 correspond to 0.999 and 0.001 probabilities. SPI indices outside
        # [-3.09, 3.09] might be non-statistically relevant. In `climate_indices` the SPI
        # index is clipped outside this region of value. In the values chosen above,
        # this doesn't play role, but let's clip it anyways to avoid future problems.
        # The last few values in time are tested
        spi = spi.clip(-3.09, 3.09)
    
>       np.testing.assert_allclose(spi.values, values, rtol=0, atol=diff_tol)

tests/test_indices.py:628: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<function assert_allclose.<locals>.compare at 0x7f3c5c017560>, array([-0.08378482,  1.45764713,  0.99329611,  0.27189447, -0.44968379]), array([-0.011698,  1.597031,  0.969714,  0.265561, -0.132654]))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=0, atol=0.02', 'strict': False, ...}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Not equal to tolerance rtol=0, atol=0.02
E           
E           Mismatched elements: 4 / 5 (80%)
E           Max absolute difference among violations: 0.31702979
E           Max relative difference among violations: 6.16232032
E            ACTUAL: array([-0.083785,  1.457647,  0.993296,  0.271894, -0.449684])
E            DESIRED: array([-0.011698,  1.597031,  0.969714,  0.265561, -0.132654])

../../miniforge3/envs/xclim/lib/python3.12/contextlib.py:81: AssertionError
____________________________________________ TestStandardizedIndices.test_standardized_precipitation_index[D-12-gamma-ML-values9-0.02] _____________________________________________
[gw1] linux -- Python 3.12.4 /home/pbourg/miniforge3/envs/xclim/bin/python3.12

self = <test_indices.TestStandardizedIndices object at 0x7f3c5fbf7530>, open_dataset = <function open_dataset.<locals>._open_session_scoped_file at 0x7f3c5dd1c720>, freq = 'D'
window = 12, dist = 'gamma', method = 'ML', values = [-0.158116, -0.049222, 0.672544, 1.08332, 0.660903], diff_tol = 0.02

    @pytest.mark.slow
    @pytest.mark.parametrize(
        "freq, window, dist, method,  values, diff_tol",
        [
            (
                "MS",
                1,
                "gamma",
                "APP",
                [1.31664, 1.45069, 1.94609, -3.09, 0.850681],
                2e-2,
            ),
            (
                "MS",
                12,
                "gamma",
                "APP",
                [0.598209, 1.55976, 1.69309, 0.9964, 0.7028],
                2e-2,
            ),
            (
                "MS",
                1,
                "gamma",
                "ML",
                [1.460105, 1.602951, 2.072521, -3.09, 0.891468],
                0.04,
            ),
            ("MS", 12, "gamma", "ML", [0.59821, 1.5598, 1.6931, 0.9964, 0.7028], 0.04),
            (
                "MS",
                1,
                "fisk",
                "ML",
                [1.41236, 1.51192, 1.93324, -2.74089, 0.932674],
                2e-2,
            ),
            (
                "MS",
                12,
                "fisk",
                "ML",
                [0.683273, 1.51189, 1.61597, 1.03875, 0.72531],
                2e-2,
            ),
            (
                "D",
                1,
                "gamma",
                "APP",
                [-0.18618353, 1.44582971, 0.95985043, 0.15779587, -0.37801587],
                2e-2,
            ),
            (
                "D",
                12,
                "gamma",
                "APP",
                [-0.24417774, -0.11404418, 0.64997039, 1.07670517, 0.6462852],
                2e-2,
            ),
            (
                "D",
                1,
                "gamma",
                "ML",
                [-0.011698, 1.597031, 0.969714, 0.265561, -0.132654],
                2e-2,
            ),
            (
                "D",
                12,
                "gamma",
                "ML",
                [-0.158116, -0.049222, 0.672544, 1.08332, 0.660903],
                2e-2,
            ),
            (
                "D",
                1,
                "fisk",
                "ML",
                [-0.158949, 1.308225, 0.449846, 0.146699, -0.502737],
                2e-2,
            ),
            (
                "D",
                12,
                "fisk",
                "ML",
                [-0.14151269, -0.01914608, 0.7080277, 1.01510279, 0.6954002],
                2e-2,
            ),
            (
                "D",
                1,
                "fisk",
                "APP",
                [-0.417288, 1.275686, 1.002566, 0.206854, -0.672117],
                2e-2,
            ),
            (
                "D",
                12,
                "fisk",
                "APP",
                [-0.220136, -0.065782, 0.783319, 1.134428, 0.782857],
                2e-2,
            ),
            (
                None,
                1,
                "gamma",
                "APP",
                [-0.18618353, 1.44582971, 0.95985043, 0.15779587, -0.37801587],
                2e-2,
            ),
            (
                None,
                12,
                "gamma",
                "APP",
                [-0.24417774, -0.11404418, 0.64997039, 1.07670517, 0.6462852],
                2e-2,
            ),
        ],
    )
    def test_standardized_precipitation_index(
        self, open_dataset, freq, window, dist, method, values, diff_tol
    ):
        ds = open_dataset("sdba/CanESM2_1950-2100.nc").isel(location=1)
        if freq == "D":
            ds = ds.convert_calendar("366_day", missing=np.nan)
            # to compare with ``climate_indices``
        pr = ds.pr.sel(time=slice("1998", "2000"))
        pr_cal = ds.pr.sel(time=slice("1950", "1980"))
        fitkwargs = {}
        if method == "APP":
            fitkwargs["floc"] = 0
        params = xci.stats.standardized_index_fit_params(
            pr_cal,
            freq=freq,
            window=window,
            dist=dist,
            method=method,
            fitkwargs=fitkwargs,
            zero_inflated=True,
        )
        spi = xci.standardized_precipitation_index(pr, params=params)
        # Only a few moments before year 2000 are tested
        spi = spi.isel(time=slice(-11, -1, 2))
    
        # [Guttman, 1999]: The number of precipitation events (over a month/season or
        # other time period) is generally less than 100 in the US. This suggests that
        # bounds of ± 3.09 correspond to 0.999 and 0.001 probabilities. SPI indices outside
        # [-3.09, 3.09] might be non-statistically relevant. In `climate_indices` the SPI
        # index is clipped outside this region of value. In the values chosen above,
        # this doesn't play role, but let's clip it anyways to avoid future problems.
        # The last few values in time are tested
        spi = spi.clip(-3.09, 3.09)
    
>       np.testing.assert_allclose(spi.values, values, rtol=0, atol=diff_tol)

tests/test_indices.py:628: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<function assert_allclose.<locals>.compare at 0x7f3c5c017ce0>, array([-0.15885414, -0.0491648 ,  0.67586258,  0.96024696,  0.66083077]), array([-0.158116, -0.049222,  0.672544,  1.08332 ,  0.660903]))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=0, atol=0.02', 'strict': False, ...}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Not equal to tolerance rtol=0, atol=0.02
E           
E           Mismatched elements: 1 / 5 (20%)
E           Max absolute difference among violations: 0.12307304
E           Max relative difference among violations: 0.11360728
E            ACTUAL: array([-0.158854, -0.049165,  0.675863,  0.960247,  0.660831])
E            DESIRED: array([-0.158116, -0.049222,  0.672544,  1.08332 ,  0.660903])

../../miniforge3/envs/xclim/lib/python3.12/contextlib.py:81: AssertionError
_____________________________________________ TestStandardizedIndices.test_standardized_precipitation_index[D-1-fisk-ML-values10-0.02] _____________________________________________
[gw1] linux -- Python 3.12.4 /home/pbourg/miniforge3/envs/xclim/bin/python3.12

self = <test_indices.TestStandardizedIndices object at 0x7f3c5fbf75c0>, open_dataset = <function open_dataset.<locals>._open_session_scoped_file at 0x7f3c5dd1c720>, freq = 'D'
window = 1, dist = 'fisk', method = 'ML', values = [-0.158949, 1.308225, 0.449846, 0.146699, -0.502737], diff_tol = 0.02

    @pytest.mark.slow
    @pytest.mark.parametrize(
        "freq, window, dist, method,  values, diff_tol",
        [
            (
                "MS",
                1,
                "gamma",
                "APP",
                [1.31664, 1.45069, 1.94609, -3.09, 0.850681],
                2e-2,
            ),
            (
                "MS",
                12,
                "gamma",
                "APP",
                [0.598209, 1.55976, 1.69309, 0.9964, 0.7028],
                2e-2,
            ),
            (
                "MS",
                1,
                "gamma",
                "ML",
                [1.460105, 1.602951, 2.072521, -3.09, 0.891468],
                0.04,
            ),
            ("MS", 12, "gamma", "ML", [0.59821, 1.5598, 1.6931, 0.9964, 0.7028], 0.04),
            (
                "MS",
                1,
                "fisk",
                "ML",
                [1.41236, 1.51192, 1.93324, -2.74089, 0.932674],
                2e-2,
            ),
            (
                "MS",
                12,
                "fisk",
                "ML",
                [0.683273, 1.51189, 1.61597, 1.03875, 0.72531],
                2e-2,
            ),
            (
                "D",
                1,
                "gamma",
                "APP",
                [-0.18618353, 1.44582971, 0.95985043, 0.15779587, -0.37801587],
                2e-2,
            ),
            (
                "D",
                12,
                "gamma",
                "APP",
                [-0.24417774, -0.11404418, 0.64997039, 1.07670517, 0.6462852],
                2e-2,
            ),
            (
                "D",
                1,
                "gamma",
                "ML",
                [-0.011698, 1.597031, 0.969714, 0.265561, -0.132654],
                2e-2,
            ),
            (
                "D",
                12,
                "gamma",
                "ML",
                [-0.158116, -0.049222, 0.672544, 1.08332, 0.660903],
                2e-2,
            ),
            (
                "D",
                1,
                "fisk",
                "ML",
                [-0.158949, 1.308225, 0.449846, 0.146699, -0.502737],
                2e-2,
            ),
            (
                "D",
                12,
                "fisk",
                "ML",
                [-0.14151269, -0.01914608, 0.7080277, 1.01510279, 0.6954002],
                2e-2,
            ),
            (
                "D",
                1,
                "fisk",
                "APP",
                [-0.417288, 1.275686, 1.002566, 0.206854, -0.672117],
                2e-2,
            ),
            (
                "D",
                12,
                "fisk",
                "APP",
                [-0.220136, -0.065782, 0.783319, 1.134428, 0.782857],
                2e-2,
            ),
            (
                None,
                1,
                "gamma",
                "APP",
                [-0.18618353, 1.44582971, 0.95985043, 0.15779587, -0.37801587],
                2e-2,
            ),
            (
                None,
                12,
                "gamma",
                "APP",
                [-0.24417774, -0.11404418, 0.64997039, 1.07670517, 0.6462852],
                2e-2,
            ),
        ],
    )
    def test_standardized_precipitation_index(
        self, open_dataset, freq, window, dist, method, values, diff_tol
    ):
        ds = open_dataset("sdba/CanESM2_1950-2100.nc").isel(location=1)
        if freq == "D":
            ds = ds.convert_calendar("366_day", missing=np.nan)
            # to compare with ``climate_indices``
        pr = ds.pr.sel(time=slice("1998", "2000"))
        pr_cal = ds.pr.sel(time=slice("1950", "1980"))
        fitkwargs = {}
        if method == "APP":
            fitkwargs["floc"] = 0
        params = xci.stats.standardized_index_fit_params(
            pr_cal,
            freq=freq,
            window=window,
            dist=dist,
            method=method,
            fitkwargs=fitkwargs,
            zero_inflated=True,
        )
        spi = xci.standardized_precipitation_index(pr, params=params)
        # Only a few moments before year 2000 are tested
        spi = spi.isel(time=slice(-11, -1, 2))
    
        # [Guttman, 1999]: The number of precipitation events (over a month/season or
        # other time period) is generally less than 100 in the US. This suggests that
        # bounds of ± 3.09 correspond to 0.999 and 0.001 probabilities. SPI indices outside
        # [-3.09, 3.09] might be non-statistically relevant. In `climate_indices` the SPI
        # index is clipped outside this region of value. In the values chosen above,
        # this doesn't play role, but let's clip it anyways to avoid future problems.
        # The last few values in time are tested
        spi = spi.clip(-3.09, 3.09)
    
>       np.testing.assert_allclose(spi.values, values, rtol=0, atol=diff_tol)

tests/test_indices.py:628: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<function assert_allclose.<locals>.compare at 0x7f3c5c017ba0>, array([-0.19423456,  1.30819834,  0.53076805,  0.2223399 , -0.50263457]), array([-0.158949,  1.308225,  0.449846,  0.146699, -0.502737]))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=0, atol=0.02', 'strict': False, ...}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Not equal to tolerance rtol=0, atol=0.02
E           
E           Mismatched elements: 3 / 5 (60%)
E           Max absolute difference among violations: 0.08092205
E           Max relative difference among violations: 0.51561972
E            ACTUAL: array([-0.194235,  1.308198,  0.530768,  0.22234 , -0.502635])
E            DESIRED: array([-0.158949,  1.308225,  0.449846,  0.146699, -0.502737])

../../miniforge3/envs/xclim/lib/python3.12/contextlib.py:81: AssertionError

This PR still requires xarray's master to run.

@aulemahal aulemahal changed the title Fix for Numpy 2 Dimensionless units as "1" - adapt to and pin numpy 2 Jul 8, 2024
@coxipi
Copy link
Contributor

coxipi commented Jul 9, 2024

Regarding the spi errors, let me show a case study on the first error ("ML-gamma-D-window==1").

First, here are many values, the error is relatively small but not negligible:
image
(yes a spi, especially daily, can have some weird peaks)

This error can be traced back to _fit_start:

xclim.indices.stats._fit_start(pr_cal_doy1.values, "gamma")
# old
>>> ((0.707242185969578,),  {'scale': 2.6744933340768465e-05, 'loc': 4.230436713933637e-08})
# new np2
>>> ((np.float32(0.70724225),),  {'scale': np.float32(2.674493e-05), 'loc': np.float32(4.2304368e-08)})

we have float32 in np2, float64 before. Taking np.float32 of old results, I now have closer results:
image

Bottom line: Very small difference in the output of _fit_start due float64 -> float32 in np2 propagates in the fitting routine, a somewhat different set of parameters can be found, which translates in different SPI values.

Why the remaining small errors in the second figure?

Probably weird rounding errors float64 -> float32. I took the float32 at the end of _fit_start. For instance, for doy==124, we have 0.458947 vs. 0.45894697 for the shape param

# e.g. doy==124
# np.float32(old)
>>> ((0.458947,), {'scale': 4.4886667e-05, 'loc': -6.8731404e-14})
# new np2
>>> ((np.float32(0.45894697),),  {'scale': np.float32(4.488667e-05), 'loc': np.float32(-6.87314e-14)})

Taking the float64 -> float32 earlier in _fit_start could probably help, but I don't know, I tried x = x.astype(np.float32) at the beginning of _fit_start but tt was still giving me float64 after... anyways. I think the point is still pretty clear.

environment.yml Outdated Show resolved Hide resolved
@Zeitsperre
Copy link
Collaborator

@aulemahal I'm tapping out here for now.

I needed to replace netcdf4 everywhere because the latest releases (>=1.7.0) are really unstable, but the older versions are not compatible with numpy>=2.0.0. This is causing issues with a handful of tests, mainly those that either:

  • Expect messages that are provided by the netcdf4 backend (e.g. progress statements).
  • Rely on synonymous opening of files (h5netcdf uses blocking more and has a harder time with this).

h5netcdf is now a core dependency, seeing that it is being used by default in all read/write operations. I also added the option for setting the --engine using the CLI (default is unsurprisingly h5netcdf).

Here are the issues I can see that needs adjusting:

  • Within the tests, there's a fixture called "open_dataset" that hard-codes the use of "h5netcdf". We should be using that more thoroughly throughout the test suite (realized this after the fact).
  • Perhaps we should see whether we can operate entirely without netcdf4 and investigate the acceptability of optional engines like pydap for OPeNDAP support?
  • The objects coming out of xclim.ensembles appear to be affected, though it isn't clear to me why that might be.

Ping me if you have questions.

@aulemahal
Copy link
Collaborator Author

Thanks for the work!

I think the errors you saw in the ensemble modules are those caused by the float64 issue with cftime in xarray's current release. In my env with xarray@master, I see no failures in this module.

The h5netcdf error "OSError: Unable to synchronously open file (file signature not found)" happens when you try to open a non-HDF5 netcdf with engine "h5netcdf". I.e. : if you wrote it with scipy and try to force engine='h5netcdf' on the opening. Is this the error you are referring to when you talk about tests that "Rely on synonymous opening of files " ?

In my env, the CLI tests failed because the outputs were written without a specified engine and xarray seemed to default to scipy. The subsequent opening with h5netcdf forced failed with this (sadly unclear) error.

I disagree with making h5netcdf a core dependency of xclim. My principal arguments would be that

  • xclim does not do I/O (except in the cli module, but even then it does very generic calls)
  • xarray itself does not have any engine as core dependencies. We should follow the same logic.

But it is a test dependency, of course. The best way to force tests not to use netCDF4 is to not have the package in the env in the first place. In my env, I uninstalled it and all tests pass. I'll push somehing and we'll see!

I have no strong feelings about forcing h5netcdf everywhere in the tests. But it seems to be slightly dangerous as seen with the cli tests above : a to_netcdf call without an engine might write with scipy, which would make an open_dataset(..., engine='h5netcdf') call fail.

@aulemahal aulemahal changed the title Dimensionless units as "1" - adapt to and pin numpy 2 Dimensionless units as "1" - adapt to numpy 2 Jul 31, 2024
@aulemahal aulemahal marked this pull request as ready for review July 31, 2024 16:20
CHANGELOG.rst Outdated Show resolved Hide resolved
@github-actions github-actions bot added the approved Approved for additional tests label Jul 31, 2024
CHANGELOG.rst Outdated Show resolved Hide resolved
@aulemahal
Copy link
Collaborator Author

@Zeitsperre FYI, I removed a test that was ensuring an error was raised on unfound opendap url. As we don't install netCDF4 in the CI anymore, the CI doesn't have opendap support anyway, so I thought the test wasn't necessary anymore.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added the docs Improvements to documenation label Jul 31, 2024
@aulemahal aulemahal merged commit d13cb6c into main Jul 31, 2024
21 checks passed
@aulemahal aulemahal deleted the fix-for-np2 branch July 31, 2024 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Interfacing and User Concerns approved Approved for additional tests CI Automation and Contiunous Integration docs Improvements to documenation indicators Climate indices and indicators sdba Issues concerning the sdba submodule.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New numpy version breaks everything
3 participants