Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArrayCoarsen does not have a map or reduce function #3741

Closed
aidenprice opened this issue Feb 1, 2020 · 3 comments · Fixed by #4939
Closed

DataArrayCoarsen does not have a map or reduce function #3741

aidenprice opened this issue Feb 1, 2020 · 3 comments · Fixed by #4939

Comments

@aidenprice
Copy link

I'm trying to count unique samples when resampling to a square kilometre from a 5x5m input grid. I'd like to be able to apply the Dask.array.unique() function with return_counts=True to give me a new dimension with the original integer values and their counts.

In order to resample along spatial dimensions I assume I need to use .coarsen(), unfortunately the core.rolling.DataArrayCoarsen object does not yet implement either a .map() or .reduce() function for applying an arbitrary function when coarsening.

MCVE Code Sample

import xarray as xr
from dask.array import unique

da = xr.DataArray([1, 1, 2, 3, 5, 3], [('x', range(0, 6))])
coarse = da2.coarsen(dim={'x': 2}).map(unique, kwargs={'return_counts': True})
coarse

outputs;
AttributeError: 'DataArrayCoarsen' object has no attribute 'map'

N.B. core.groupby.DataArrayGroupBy has both .map() and .reduce() while core.rolling.DataArrayRolling has .reduce(). Would it make sense for all three to have the same interface?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Sep 7 2019, 18:27:02) [Clang 10.0.1 (clang-1001.0.46.4)] python-bits: 64 OS: Darwin OS-release: 19.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8 libhdf5: None libnetcdf: None

xarray: 0.15.0
pandas: 1.0.0
numpy: 1.18.1
scipy: 1.4.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: None
dask: 2.10.1
distributed: 2.10.0
matplotlib: 3.1.2
cartopy: None
seaborn: None
numbagg: None
setuptools: 40.8.0
pip: 19.0.3
conda: None
pytest: 5.3.5
IPython: 7.11.1
sphinx: None

@raphaeldussin
Copy link
Contributor

👍 that would be super useful!

@dcherian
Copy link
Contributor

coarsen is pretty similar to rolling AFAIR so it may not be too hard to implement a .reduce method.

@oarcher
Copy link

oarcher commented Dec 3, 2020

As a workaround, it's possible to use rolling and .sel to keep only adjacent windows:

ds
<xarray.Dataset>
Dimensions:  (x: 237, y: 69, z: 2)
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6 7 8 ... 228 229 230 231 232 233 234 235 236
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9 10 ... 59 60 61 62 63 64 65 66 67 68
  * z        (z) int64 0 1
Data variables:
    data2D   (x, y) float64 dask.array<chunksize=(102, 42), meta=np.ndarray>
    data3D   (x, y, z) float64 dask.array<chunksize=(102, 42, 2), meta=np.ndarray>

# window size
window = {'x' : 51, 'y' : 21}
# window dims, prefixed by 'k_'
window_dims = {k: "k_%s" % k for k in window.keys()}
# dataset, with new dims as window. .sel drop sliding windows, to keep only adjacent ones.
ds_win = ds.rolling(window,center=True).construct(window_dims).sel(
            {k: slice(window[k]//2,None,window[k]) for k in window.keys()})

<xarray.Dataset>
Dimensions:  (k_x: 51, k_y: 21, x: 5, y: 3, z: 2)
Coordinates:
  * x        (x) int64 25 76 127 178 229
  * y        (y) int64 10 31 52
  * z        (z) int64 0 1
Dimensions without coordinates: k_x, k_y
Data variables:
    data2D   (x, y, k_x, k_y) float64 dask.array<chunksize=(2, 2, 51, 21), meta=np.ndarray>
    data3D   (x, y, z, k_x, k_y) float64 dask.array<chunksize=(2, 2, 2, 51, 21), meta=np.ndarray>

# now, use reduce on a standard dataset, using window k_dims as dimensions
ds_red = ds_win.reduce(np.mean,dim=window_dims.values())

<xarray.Dataset>
Dimensions:  (x: 5, y: 3, z: 2)
Coordinates:
  * x        (x) int64 25 76 127 178 229
  * y        (y) int64 10 31 52
  * z        (z) int64 0 1
Data variables:
    data2D   (x, y) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
    data3D   (x, y, z) float64 dask.array<chunksize=(2, 2, 2), meta=np.ndarray>

Note that i was unable to use unique, because the size of the result depend on the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants