Description
I was testing the latest version of xarray (0.12.3) from the conda-forge channel and this broke some code I had. Under the defaults installation not using conda-forge (xarray=0.12.1), the following code works correctly with desired output:
Test code
import pandas as pd
import xarray as xr
import numpy as np
s_date = '1990-01-01'
e_date = '2019-05-01'
days = pd.date_range(start=s_date, end=e_date, freq='B', name='day')
items = pd.Index([str(i) for i in range(300)], name = 'item')
dat = xr.DataArray(np.random.rand(len(days), len(items)), coords=[days, items])
dat_chunk = dat.chunk({'item': 20})
dat_mean = dat_chunk.rolling(day=10).mean()
print(dat_chunk)
print(' ')
print(dat_mean)
dat_std_avg = dat_mean.rolling(day=250).std()
print(' ')
print(dat_std_avg)
Output (correct) with xarray=0.12.1 - note the chunksizes
<xarray.DataArray (day: 7653, item: 300)>
dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)>
Coordinates:
* day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01
* item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'
<xarray.DataArray '_trim-8c9287bf114d61cb3ad74780465cd19f' (day: 7653, item: 300)>
dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)>
Coordinates:
* day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01
* item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'
<xarray.DataArray '_trim-2ee90b6c2f29f71a7798a204a4ad3305' (day: 7653, item: 300)>
dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)>
Coordinates:
* day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01
* item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'
Output (now failing) with xarray=0.12.3 (note the chunksizes)
<xarray.DataArray (day: 7653, item: 300)>
dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)>
Coordinates:
* day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01
* item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'
<xarray.DataArray (day: 7653, item: 300)>
dask.array<shape=(7653, 300), dtype=float64, chunksize=(5, 20)>
Coordinates:
* day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01
* item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: For window size 250, every chunk should be larger than 125, but the smallest chunk size is 5. Rechunk your array
with a larger chunk size or a chunk size that
more evenly divides the shape of your array.
Problem Description
Using dask + rolling + xarray=0.12.3 appears to add undesirable chunking in a new dimension which was not the case previously using xarray=0.12.1 This additional chunking made the the queuing of a further rolling operation fail with a ValueError. This (at the very least) makes queuing dask based delayed operations difficult when multiple rolling operations are used.
Output of xr.show_versions()
for the not working version
xarray: 0.12.3
pandas: 0.25.1
numpy: 1.16.4
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.3.0
distributed: 2.3.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.0.1
pip: 19.2.2
conda: 4.7.11
pytest: None
IPython: 7.8.0
sphinx: None
Apologies if this issue is reported, I was unable to find a case that appeared equivalent.