Description
What happened?
When using `rolling(...).construct(...) in coiled/benchmarks#1552, I noticed that my Dask workers died running out of memory because the chunk sizes get blown up.
What did you expect to happen?
Naively, I would expect rolling(...).construct(...)
to try and keep chunk sizes constant instead of blowing them up quadratic to the window size.
Minimal Complete Verifiable Example
import dask.array as da
import xarray as xr
# Construct dataset with chunk size of (400, 400, 1) or 1.22 MiB
ds = xr.Dataset(
dict(
foo=(
["latitute", "longitude", "time"],
da.random.random((400, 400, 400), chunks=(-1, -1, 1)),
),
)
)
# Dataset now has chunks of size (400, 400, 100 100) or 11.92 GiB
ds = ds.rolling(time=100, center=True).construct("window")
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
No response
Anything else we need to know?
No response
Environment
xarray: 2024.7.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.14.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.2
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: 1.4.0
dask: 2024.9.0
distributed: 2024.9.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.6.1
cupy: None
pint: None
sparse: 0.15.4
flox: 0.9.9
numpy_groupies: 0.11.2
setuptools: 73.0.1
pip: 24.2
conda: 24.7.1
pytest: 8.3.3
mypy: None
IPython: 8.27.0
sphinx: None