Description
Code Sample
Hello,
I performed a quick comparison of the newly introduced method ìnterp()` with an adapter (draft) to the sdf (scientific data format) library: https://gist.github.com/gwin-zegal/b955c3ef63f5ad51eec6329dd2e620be#file-array_sdf_interp-py
Code for a micro comparison (2D array) in python (include the above gist first):
arr = xr.DataArray(np.sort(np.sort(np.random.RandomState(123).rand(30,4), axis=0), axis=1),
coords=[('tension', np.arange(10, 40)),
('resistance', np.linspace(100, 500, 4))])
res = {'xarray': [], 'c_sdf' : []}
x = np.logspace(1, 4, num=10, dtype=np.int16)
for size in x:
new_tension = arr.tension[0].data + np.random.random_sample(size=size) * (arr.tension[-1].data - arr.tension[0].data)
new_resistance = arr.resistance[0].data + np.random.random_sample(size=size) * (arr.resistance[-1].data - arr.resistance[0].data)
interp_xr = %timeit -qo arr.interp({'tension': new_tension, 'resistance': new_resistance})
res['xarray'].append(interp_xr)
interp_c_sdf = %timeit -qo arr(new_tension, new_resistance)
res['c_sdf'].append(interp_c_sdf)
Problem description
The time spent for array.interp()
is growing exponentially... over two 2min (xarray internal interp) on my old machine compared to 9ms (C-SDF wrapper) for 10_000 interpolations.
The C-SDF code is slow (a copy of the array is performed and algorithms not so optimized), but xarray implementation is not usable in daily life on my machine!
{'xarray': [<TimeitResult : 6.46 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 6.46 ms ± 88.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 6.99 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 8.91 ms ± 52 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 19.8 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 112 ms ± 638 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)>,
<TimeitResult : 584 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>,
<TimeitResult : 2.63 s ± 43.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>,
<TimeitResult : 15.5 s ± 147 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>,
<TimeitResult : 2min 23s ± 18.7 s per loop (mean ± std. dev. of 7 runs, 1 loop each)>],
'c_sdf': [<TimeitResult : 1.08 ms ± 7.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.09 ms ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.1 ms ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.13 ms ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.19 ms ± 7.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.32 ms ± 9.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.59 ms ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 2.19 ms ± 31.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 3.51 ms ± 34.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 9.27 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>]}
Performance issue on my machine or is it confirmed by others?
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
xarray: 0.10.7
pandas: 0.23.0
numpy: 1.14.3
scipy: 1.1.0
netCDF4: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 39.2.0
pip: 10.0.1
conda: 4.5.4
pytest: 3.4.1
IPython: 6.4.0
sphinx: 1.7.5