Skip to content

nbytes not available for lazy loaded array and so can't print(ds) #9185

@TimothyCera-NOAA

Description

@TimothyCera-NOAA

What happened?

We use the grib2io backend to read GRIB2 formatted files. Started to have problem printing the summary of the dataset to the screen with the v2024.02.0 release. I suspect the problem is from #8702

Trying to print a dataset will fail trying to find nbytes.

The grib2io backend opens the file lazily, which means you are printing the summary of a MemoryCachedArray which doesn't have nbytes, nor is able to calculate.

Loading the data into memory and then the print(ds1) works fine.

import xarray as xr
filters = {
        "productDefinitionTemplateNumber": 0,
        "typeOfFirstFixedSurface": 1,
        "shortName": "TMP",
        }
ds1 = xr.open_dataset(
        "gfs_20221107/gfs.t00z.pgrb2.1p00.f012_subset",
        engine="grib2io",
        filters=filters,
    )
print(ds1)
TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 print(ds1)

File ~/anaconda3/envs/default311/lib/python3.11/site-packages/xarray/core/dataset.py:2569, in Dataset.__repr__(self)
   2568 def __repr__(self) -> str:
-> 2569     return formatting.dataset_repr(self)

File ~/anaconda3/envs/default311/lib/python3.11/reprlib.py:21, in recursive_repr.<locals>.decorating_function.<locals>.wrapper(self)
     19 repr_running.add(key)
     20 try:
---> 21     result = user_function(self)
     22 finally:
     23     repr_running.discard(key)

File ~/anaconda3/envs/default311/lib/python3.11/site-packages/xarray/core/formatting.py:717, in dataset_repr(ds)
    715 @recursive_repr("<recursive Dataset>")
    716 def dataset_repr(ds):
--> 717     nbytes_str = render_human_readable_nbytes(ds.nbytes)
    718     summary = [f"<xarray.{type(ds).__name__}> Size: {nbytes_str}"]
    720     col_width = _calculate_col_width(ds.variables)

File ~/anaconda3/envs/default311/lib/python3.11/site-packages/xarray/core/dataset.py:1544, in Dataset.nbytes(self)
   1536 @property
   1537 def nbytes(self) -> int:
   1538     """
   1539     Total bytes consumed by the data arrays of all variables in this dataset.
   1540 
   1541     If the backend array for any variable does not include ``nbytes``, estimates
   1542     the total bytes for that array based on the ``size`` and ``dtype``.
   1543     """
-> 1544     return sum(v.nbytes for v in self.variables.values())

File ~/anaconda3/envs/default311/lib/python3.11/site-packages/xarray/core/dataset.py:1544, in <genexpr>(.0)
   1536 @property
   1537 def nbytes(self) -> int:
   1538     """
   1539     Total bytes consumed by the data arrays of all variables in this dataset.
   1540 
   1541     If the backend array for any variable does not include ``nbytes``, estimates
   1542     the total bytes for that array based on the ``size`` and ``dtype``.
   1543     """
-> 1544     return sum(v.nbytes for v in self.variables.values())

File ~/anaconda3/envs/default311/lib/python3.11/site-packages/xarray/namedarray/core.py:491, in NamedArray.nbytes(self)
    489         itemsize = xp.finfo(self.dtype).bits // 8
    490 else:
--> 491     raise TypeError(
    492         "cannot compute the number of bytes (no array API nor nbytes / itemsize)"
    493     )
    495 return self.size * itemsize

TypeError: cannot compute the number of bytes (no array API nor nbytes / itemsize)

You can force loading the data and then printing works:

print(ds1["TMP"].values[0][0])
253.28014

print(ds1)
<xarray.Dataset> Size: 1MB
Dimensions:                   (y: 181, x: 360)
Coordinates:
    refDate                   datetime64[ns] 8B ...
    leadTime                  timedelta64[ns] 8B ...
    valueOfFirstFixedSurface  float64 8B ...
    latitude                  (y, x) float64 521kB ...
    longitude                 (y, x) float64 521kB ...
    validDate                 datetime64[ns] 8B ...
Dimensions without coordinates: y, x
Data variables:
    TMP                       (y, x) float32 261kB 253.3 253.3 ... 240.2 240.2
Attributes:
    engine:   grib2io

What did you expect to happen?

Want print(ds1) to print the summary of the dataset.

<xarray.Dataset> Size: 1MB
Dimensions:                   (y: 181, x: 360)
Coordinates:
    refDate                   datetime64[ns] 8B ...
    leadTime                  timedelta64[ns] 8B ...
    valueOfFirstFixedSurface  float64 8B ...
    latitude                  (y, x) float64 521kB ...
    longitude                 (y, x) float64 521kB ...
    validDate                 datetime64[ns] 8B ...
Dimensions without coordinates: y, x
Data variables:
    TMP                       (y, x) float32 261kB 253.3 253.3 ... 240.2 240.2
Attributes:
    engine:   grib2io

Minimal Complete Verifiable Example

# You have to download the GRIB2 file from 
"""
https://github.com/NOAA-MDL/grib2io/blob/master/tests/data/gfs_20221107/gfs.t00z.pgrb2.1p00.f012_subset
"""
import xarray as xr
filters = {
            "productDefinitionTemplateNumber": 0,
            "typeOfFirstFixedSurface": 1,
            "shortName": "TMP",
            }
ds1 = xr.open_dataset(
            "gfs_20221107/gfs.t00z.pgrb2.1p00.f012_subset",
            engine="grib2io",
            filters=filters,
        )
print(ds1)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:40) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 5.15.0-112-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.6.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
zarr: 2.17.1
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.3.1
distributed: 2024.3.1
matplotlib: 3.8.4
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: 0.23
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.2.0
pip: 24.0
conda: 24.3.0
pytest: 8.1.1
mypy: None
IPython: 8.22.2
sphinx: 7.3.7

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions