Skip to content

Unexpected chunking of 3d DataArray in polyfit() #4554

Closed
@paigem

Description

@paigem

What happened:
When running polyfit() on a 3d chunked xarray DataArray, the output is chunked differently than the input array.

What you expected to happen:
I expect the output to have the same chunking as the input.

Minimal Complete Verifiable Example:
(from @rabernat in xgcm/xrft#116)

Example: number of chunks decreases

import dask.array as dsa
import xarray as xr

nz, ny, nx = (10, 20, 30)
data = dsa.ones((nz, ny, nx), chunks=(1, 5, nx))
da = xr.DataArray(data, dims=['z', 'y', 'x'])
da.chunks
# -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,))

pf = da.polyfit('x', 1)
pf.polyfit_coefficients.chunks
# -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (20,), (30,))
# chunks on the y dimension have been consolidated!

pv = xr.polyval(da.x, pf.polyfit_coefficients).transpose('z', 'y', 'x')
pv.chunks
# -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (20,), (30,))
# and this propagates to polyval

# align back against the original data
(da - pv).chunks
# -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,))
# hides the fact that we have chunk consolidation happening upstream

Example: number of chunks increases

nz, ny, nx = (6, 10, 4)
data = dsa.ones((nz, ny, nx), chunks=(2, 10, 2))
da = xr.DataArray(data, dims=['z', 'y', 'x'])
da.chunks
# -> ((2, 2, 2), (10,), (2, 2))

pf = da.polyfit('y', 1)
pf.polyfit_coefficients.chunks
# -> ((2,), (1, 1, 1, 1, 1, 1), (4,))

pv = xr.polyval(da.y, pf.polyfit_coefficients).transpose('z', 'y', 'x')
pv.chunks
# -> ((1, 1, 1, 1, 1, 1), (10,), (4,))

(da - pv).chunks
# -> ((1, 1, 1, 1, 1, 1), (10,), (2, 2))

(This discussion started in xgcm/xrft#116 with @rabernat and @navidcy.)

Environment:

Running on Pangeo Cloud

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.19.112+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.1
pandas: 1.1.3
numpy: 1.19.2
scipy: 1.5.2
netCDF4: 1.5.4
pydap: installed
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: 1.1.7
cfgrib: 0.9.8.4
iris: None
bottleneck: 1.3.2
dask: 2.30.0
distributed: 2.30.0
matplotlib: 3.3.2
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: 0.16.1
setuptools: 49.6.0.post20201009
pip: 20.2.3
conda: None
pytest: 6.1.1
IPython: 7.18.1
sphinx: 3.2.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions