Skip to content

xarray, chunking and rolling operation adds chunking along new dimension (previously worked) #3277

Closed
@p-d-moore

Description

@p-d-moore

I was testing the latest version of xarray (0.12.3) from the conda-forge channel and this broke some code I had. Under the defaults installation not using conda-forge (xarray=0.12.1), the following code works correctly with desired output:

Test code

import pandas as pd
import xarray as xr
import numpy as np

s_date = '1990-01-01'
e_date = '2019-05-01'
days = pd.date_range(start=s_date, end=e_date, freq='B', name='day')
items = pd.Index([str(i) for i in range(300)], name = 'item')
dat = xr.DataArray(np.random.rand(len(days), len(items)), coords=[days, items])
dat_chunk = dat.chunk({'item': 20})
dat_mean = dat_chunk.rolling(day=10).mean()

print(dat_chunk)
print(' ')
print(dat_mean)

dat_std_avg = dat_mean.rolling(day=250).std()

print(' ')
print(dat_std_avg)

Output (correct) with xarray=0.12.1 - note the chunksizes

<xarray.DataArray (day: 7653, item: 300)>
dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)>
Coordinates:
  * day      (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01
  * item     (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'
 
<xarray.DataArray '_trim-8c9287bf114d61cb3ad74780465cd19f' (day: 7653, item: 300)>
dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)>
Coordinates:
  * day      (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01
  * item     (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'
 
<xarray.DataArray '_trim-2ee90b6c2f29f71a7798a204a4ad3305' (day: 7653, item: 300)>
dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)>
Coordinates:
  * day      (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01
  * item     (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'

Output (now failing) with xarray=0.12.3 (note the chunksizes)

<xarray.DataArray (day: 7653, item: 300)>
dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)>
Coordinates:
  * day      (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01
  * item     (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'
 
<xarray.DataArray (day: 7653, item: 300)>
dask.array<shape=(7653, 300), dtype=float64, chunksize=(5, 20)>
Coordinates:
  * day      (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01
  * item     (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...

ValueError: For window size 250, every chunk should be larger than 125, but the smallest chunk size is 5. Rechunk your array
with a larger chunk size or a chunk size that
more evenly divides the shape of your array.

Problem Description

Using dask + rolling + xarray=0.12.3 appears to add undesirable chunking in a new dimension which was not the case previously using xarray=0.12.1 This additional chunking made the the queuing of a further rolling operation fail with a ValueError. This (at the very least) makes queuing dask based delayed operations difficult when multiple rolling operations are used.

Output of xr.show_versions() for the not working version

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 22:01:29) [MSC v.1900 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD byteorder: little LC_ALL: None LANG: None LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.1

xarray: 0.12.3
pandas: 0.25.1
numpy: 1.16.4
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.3.0
distributed: 2.3.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.0.1
pip: 19.2.2
conda: 4.7.11
pytest: None
IPython: 7.8.0
sphinx: None

Apologies if this issue is reported, I was unable to find a case that appeared equivalent.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions