Skip to content

Opening a dataset doesn't display groups. #4840

Closed
@dklink

Description

@dklink

Problem

I know xarray doesn't support netCDF4 Group functionality. That's fine, I bet it's incredibly thorny. My issue is, when you open the root group of a netCDF4 file which contains groups, xarray doesn't even tell you that there are groups; they are totally invisible. This seems like a big flaw; you've opened a file, shouldn't you at least be told what's in it?

Solution

When you open a dataset with the netcdf4-python library, you get something like this:

>>> netCDF4.Dataset(path)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
some global attribute: some value
dimensions(sizes): ...
variables(dimensions): ...
groups: group1, group2

"groups" shows up sort of like an auto-generated attribute. Surely xarray can do something similar:

>>> xr.open_dataset(path)
<xarray.Dataset>
Dimensions: ...
Coordinates: ...
Data variables: ...
Attributes: ...
Groups: group1, group2

Workaround

The workaround I am considering is to actually add an attribute to my root group which contains a list of the groups in the file, so people using xarray will see that there are more groups in the file. However, this is redundant considering the information is already in the netCDF file, and also brittle since there's no guarantee the attribute truly reflects the groups in the file.

Conclusion

Considering that xr.open_dataset has a group parameter to open groups, it seems unfortunate that when you open a file, you don't see what groups are in there. Instead, you have to use an external tool to get information on the file's groups, then open them with xarray. Since this is only a matter of extracting group data and printing it, surely this is a simple (and imo, valuable) addition. I'd be happy to implement it and submit a PR if people are on-board. I might need some direction though, this is my first time digging into the xarray source code, and I don't see a __str__ method on the Dataset class, which is where I expected to make this addition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic-DataTreeRelated to the implementation of a DataTree class

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions