Skip to content

nbytes does not return the true size for sparse variables #4842

Closed
@Huite

Description

@Huite

This wasn't entirely surprising to me, but nbytes currently doesn't return the right value for sparse data -- at least, I think nbytes should show the actual size in memory?

Since it uses size here:

return self.size * self.dtype.itemsize

Rather than something like data.nnz, which of course only exists for sparse arrays...
I'm not sure if there's a sparse flag or something, or whether you'd have to do a typecheck?

Minimal Complete Verifiable Example:

import pandas as pd
import numpy as np
import xarray as xr


df = pd.DataFrame()
df["x"] = np.repeat(np.random.rand(10_000), 10)
df["y"] = np.repeat(np.random.rand(10_000), 10)
df["time"] = np.tile(pd.date_range("2000-01-01", "2000-03-10", freq="W"), 10_000)
df["rate"] = 10.0
df = df.set_index(["time", "y", "x"])

sparse_ds = xr.Dataset.from_dataframe(df, sparse=True)
print(sparse_ds["rate"].nbytes)
8000000000

Anything else we need to know?:

Environment:

Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
libhdf5: 1.10.5
libnetcdf: 4.7.3

xarray: 0.16.1
pandas: 1.1.2
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.8.0
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.27.0
distributed: 2.30.1
matplotlib: 3.3.1
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: None
setuptools: 49.6.0.post20201009
pip: 20.3.3
conda: None
pytest: 6.1.0
IPython: 7.19.0
sphinx: 3.2.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions