Skip to content

rolling: bottleneck still not working properly with dask arrays #3165

Closed
@peterhob

Description

@peterhob

MCVE Code Sample

# Your code here
import numpy as np
import xarray as xr
# from dask.distributed import Client
temp= xr.DataArray(np.zeros((5000, 50000)),dims=("x","y")).chunk({"y":100, })
temp.rolling(x=100).mean()

Expected Output

Problem Description

I was thrilled to find that the new release (both 0.12.2 and 0.12.3) fixed the rolling window issue. However, When I tried, it seems the problem is still there. Previously, the above code runs with bottleneck installed. However, with the new version, with or without bottleneck, it simply gives the memory error as below.

I have tried to use old and new versions of Dask and pandas, but with no much difference. However, the dask Dataframe version of the code (shown below) runs ok.

import dask.dataframe as dd
import dask.array as da
import numpy as np

da_array=da.from_array(np.zeros((5000, 50000)), chunks=(5000,100))
df = dd.from_dask_array(da_array)
df.rolling(window=100,axis=0).mean()

I have also tried to apply the similar thing on dataset from netcdf files, it simply started consuming very large portion of memory and gives the similar errors.

Any help are appreciated.

/miniconda3/envs/xarray/lib/python3.7/site-packages/xarray/core/merge.py:17: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
  PANDAS_TYPES = (pd.Series, pd.DataFrame, pd.Panel)
/miniconda3/envs/xarray/lib/python3.7/site-packages/xarray/core/dataarray.py:219: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
  elif isinstance(data, pd.Panel):
Traceback (most recent call last):
  File "rolltest.py", line 5, in <module>
    temp.rolling(x=100).mean()
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/xarray/core/rolling.py", line 245, in wrapped_func
    return self.reduce(func, **kwargs)
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/xarray/core/rolling.py", line 217, in reduce
    result = windows.reduce(func, dim=rolling_dim, **kwargs)
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/xarray/core/dataarray.py", line 1636, in reduce
    var = self.variable.reduce(func, dim, axis, keep_attrs, **kwargs)
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/xarray/core/variable.py", line 1369, in reduce
    input_data = self.data if allow_lazy else self.values
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/xarray/core/variable.py", line 392, in values
    return _as_array_or_item(self._data)
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/xarray/core/variable.py", line 213, in _as_array_or_item
    data = np.asarray(data)
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/numpy/core/numeric.py", line 538, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/dask/array/core.py", line 1047, in __array__
    x = self.compute()
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/dask/base.py", line 399, in compute
    return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/dask/base.py", line 399, in <listcomp>
    return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/dask/array/core.py", line 828, in finalize
    return concatenate3(results)
  File "/miniconda3/envs/xarray/lib/python3.7/site-packages/dask/array/core.py", line 3621, in concatenate3
    result = np.empty(shape=shape, dtype=dtype(deepfirst(arrays)))
MemoryError

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-51-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.12.2
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.3.0
netCDF4: 1.5.1.2
pydap: None
h5netcdf: 0.7.3
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 1.2.2
distributed: 1.28.1
matplotlib: 3.1.0
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.0.0
pip: 19.1.1
conda: 4.7.5
pytest: None
IPython: 7.5.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions