Closed
Description
This wasn't entirely surprising to me, but nbytes
currently doesn't return the right value for sparse data -- at least, I think nbytes should show the actual size in memory?
Since it uses size
here:
xarray/xarray/core/variable.py
Line 349 in a0c71c1
Rather than something like data.nnz
, which of course only exists for sparse arrays...
I'm not sure if there's a sparse flag or something, or whether you'd have to do a typecheck?
Minimal Complete Verifiable Example:
import pandas as pd
import numpy as np
import xarray as xr
df = pd.DataFrame()
df["x"] = np.repeat(np.random.rand(10_000), 10)
df["y"] = np.repeat(np.random.rand(10_000), 10)
df["time"] = np.tile(pd.date_range("2000-01-01", "2000-03-10", freq="W"), 10_000)
df["rate"] = 10.0
df = df.set_index(["time", "y", "x"])
sparse_ds = xr.Dataset.from_dataframe(df, sparse=True)
print(sparse_ds["rate"].nbytes)
8000000000
Anything else we need to know?:
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.16.1
pandas: 1.1.2
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.8.0
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.27.0
distributed: 2.30.1
matplotlib: 3.3.1
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: None
setuptools: 49.6.0.post20201009
pip: 20.3.3
conda: None
pytest: 6.1.0
IPython: 7.19.0
sphinx: 3.2.1