Skip to content

Optimization for adjust_3d #47

Closed
@riley-brady

Description

@riley-brady

Hello! Thank you for a fantastic offering with this package. This has been very helpful to get some quantile mapping up and running quickly. As a thank you, I wanted to offer some optimization I did on gridded bias correction that you might find useful. I specialize in vectorizing/optimizing gridded data, but do not have the capacity right now to open a full PR.

I used cm.adjust_3d on a ~111x111 lat lon grid with 40,000 time steps. The progress bar estimated around 2.5 hours for this but I didn't run it in full. With the below implementation, it ran in 1 minute.

You need a dask cluster running and a dask dataset of course to reap those benefits, but the implementation will speed up in-memory datasets too.


import numpy as np
import xarray as xr
from cmethods import CMethods as cm


def quantile_map_3d(
    obs: xr.DataArray,
    simh: xr.DataArray,
    simp: xr.DataArray,
    n_quantiles: int,
    kind: str,
):
    """Quantile mapping vectorized for 3D operations."""

    def qmap(
        obs: xr.DataArray,
        simh: xr.DataArray,
        simp: xr.DataArray,
        n_quantiles: int,
        kind: str,
    ) -> np.array:
        """Helper for apply ufunc to vectorize/parallelize the bias correction step."""
        return cm.quantile_mapping(
            obs=obs, simh=simh, simp=simp, n_quantiles=n_quantiles, kind=kind
        )

    result = xr.apply_ufunc(
        qmap,
        obs,
        simh,
        # Need to spoof a fake time axis since 'time' coord on full dataset is different
        # than 'time' coord on training dataset.
        simp.rename({"time": "t2"}),
        dask="parallelized",
        vectorize=True,
        # This will vectorize over the time dimension, so will submit each grid cell
        # independently
        input_core_dims=[["time"], ["time"], ["t2"]],
        # Need to denote that the final output dataset will be labeled with the
        # spoofed time coordinate
        output_core_dims=[["t2"]],
        kwargs={"n_quantiles": n_quantiles, "kind": kind},
    )

    # Rename to proper coordinate name.
    result = result.rename({"t2": "time"})

    # ufunc will put the core dimension to the end (time), so want to preserve original
    # order where time is commonly first.
    result = result.transpose(*obs.dims)
    return result

The nice thing about this is that it can handle 1D datasets without any issue. The limitation is they always have to be xarray objects. But it works with dask or in-memory datasets and any arbitrary dimensions as long as a labeled time dimension exists.

The other great thing is you could just implement the apply_ufunc wrapper to every single bias correction code without the need for a separate adjust_3d function. A user can pass in 1D or 2D+ data without any change in code.

Example:

obs = xr.DataArray(
    [[1, 2, 3, 4], [2, 3, 4, 5]],
    dims=["x", "time"],
    coords={"x": [0, 1], "time": pd.date_range("2023-10-25", freq="D", periods=4)},
).transpose("time", "x")
simh = xr.DataArray(
    [[2, 1, 5, 4], [3, 9, 1, 4]],
    dims=["x", "time"],
    coords={"x": [0, 1], "time": pd.date_range("2023-10-25", freq="D", periods=4)},
).transpose("time", "x")
simp = xr.DataArray(
    [[7, 9, 10, 14], [12, 13, 14, 15]],
    dims=["x", "time"],
    coords={"x": [0, 1], "time": pd.date_range("2040-10-25", freq="D", periods=4)},
).transpose("time", "x")

# 2D dataset
>>> quantile_map_3d(obs, simh, simp, 250, "*")
<xarray.DataArray (time: 4, x: 2)>
array([[5., 9.],
       [5., 9.],
       [5., 9.],
       [5., 9.]])
Coordinates:
  * x        (x) int64 0 1
  * time     (time) datetime64[ns] 2040-10-25 2040-10-26 2040-10-27 2040-10-28

# 1D dataset
>>> quantile_map_3d(obs.isel(x=0), simh.isel(x=0), simp.isel(x=0), 250, "*")
<xarray.DataArray (time: 4)>
array([5., 5., 5., 5.])
Coordinates:
    x        int64 0
  * time     (time) datetime64[ns] 2040-10-25 2040-10-26 2040-10-27 2040-10-28

Metadata

Metadata

Labels

FeatureNew feature or request

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions