Skip to content

Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) #4498

Closed
@mankoff

Description

@mankoff

What happened:

I have a 10 minute frequency time series. When I resample to hourly it is slow. When I resample to daily it is fast. If I drop to Pandas and resample the speeds are ~100x faster than xarray, and also the same time regardless of the resample period. I've posted this to SO: https://stackoverflow.com/questions/64282393/

What you expected to happen:

I expect xarray to be within an order of magnitude speed of Pandas, not > 2 orders of magnitude slower.

Minimal Complete Verifiable Example:

import numpy as np
import xarray as xr
import pandas as pd
import time

size = 10000
times = pd.date_range('2000-01-01', periods=size, freq="10Min")
da = xr.DataArray(data = np.random.random(size), dims = ['time'], coords = {'time': times}, name='foo')

start = time.time()
da_ = da.resample({'time':"1H"}).mean()
print("1H", 'xr', str(time.time() - start))

start = time.time()
da_ = da.to_dataframe().resample("1H").mean()
print("1H", 'pd', str(time.time() - start), "\n")


start = time.time()
da_ = da.resample({'time':"1D"}).mean()
print("1D", 'xr', str(time.time() - start))

start = time.time()
da_ = da.to_dataframe().resample("1D").mean()
print("1D", 'pd', str(time.time() - start))

Output/timings

: 1H xr 0.1761918067932129
: 1H pd 0.0021948814392089844 
: 
: 1D xr 0.00958395004272461
: 1D pd 0.001646280288696289

Anything else we need to know?:

Environment:

Output of xr.show_versions()

xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.5 | packaged by conda-forge | (default, Aug 21 2020, 18:21:27)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-48-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.0
pandas: 1.1.1
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.4
pydap: None
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.5
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.3.1
cartopy: None
seaborn: None
numbagg: None
pint: 0.15
setuptools: 49.6.0.post20200814
pip: 20.2.2
conda: None
pytest: None
IPython: 7.17.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions