Skip to content

Rolling operations loose chunking with dask and bottleneck #2943

Closed
@ScottWales

Description

@ScottWales

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you:
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

import bottleneck
import xarray
import dask

data = dask.array.ones((100,), chunks=(10,))
da = xarray.DataArray(data, dims=['time'])

rolled = da.rolling(time=15).mean()

# Expect the 'rolled' dataset to be chunked approximately the same as 'data',
# however there is only one chunk in 'rolled' instead of 10
assert len(rolled.chunks[0]) > 1

Problem description

Rolling operations loose chunking over the rolled dimension when using dask datasets with bottleneck installed, which is a problem for large datasets where we don't want to load the entire thing.

The issue appears to be caused by xarray.core.dask_array_ops.dask_rolling_wrapper calling dask.array.overlap.overlap on a DataArray instead of a Dask array. Possibly #2940 is related?

Expected Output

Chunks should be preserved through .rolling().mean()

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-862.14.4.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: C LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.3
scipy: 1.2.1
netCDF4: 1.5.0.1
pydap: installed
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.3.1
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: 2.2.0
bottleneck: 1.2.1
dask: 1.2.0
distributed: 1.27.1
matplotlib: 3.0.3
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.1
pip: 19.1
conda: None
pytest: 4.4.1
IPython: 7.5.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions