Skip to content

Backends descriptions #7200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,12 @@
backends.NetCDF4DataStore.is_remote
backends.NetCDF4DataStore.lock

backends.NetCDF4BackendEntrypoint.available
backends.NetCDF4BackendEntrypoint.description
backends.NetCDF4BackendEntrypoint.url
backends.NetCDF4BackendEntrypoint.guess_can_open
backends.NetCDF4BackendEntrypoint.open_dataset

backends.H5NetCDFStore.autoclose
backends.H5NetCDFStore.close
backends.H5NetCDFStore.encode
Expand Down Expand Up @@ -510,6 +516,27 @@
backends.H5NetCDFStore.sync
backends.H5NetCDFStore.ds

backends.H5netcdfBackendEntrypoint.available
backends.H5netcdfBackendEntrypoint.description
backends.H5netcdfBackendEntrypoint.url
backends.H5netcdfBackendEntrypoint.guess_can_open
backends.H5netcdfBackendEntrypoint.open_dataset

backends.PseudoNetCDFDataStore.close
backends.PseudoNetCDFDataStore.get_attrs
backends.PseudoNetCDFDataStore.get_dimensions
backends.PseudoNetCDFDataStore.get_encoding
backends.PseudoNetCDFDataStore.get_variables
backends.PseudoNetCDFDataStore.open
backends.PseudoNetCDFDataStore.open_store_variable
backends.PseudoNetCDFDataStore.ds

backends.PseudoNetCDFBackendEntrypoint.available
backends.PseudoNetCDFBackendEntrypoint.description
backends.PseudoNetCDFBackendEntrypoint.url
backends.PseudoNetCDFBackendEntrypoint.guess_can_open
backends.PseudoNetCDFBackendEntrypoint.open_dataset

backends.PydapDataStore.close
backends.PydapDataStore.get_attrs
backends.PydapDataStore.get_dimensions
Expand All @@ -519,6 +546,12 @@
backends.PydapDataStore.open
backends.PydapDataStore.open_store_variable

backends.PydapBackendEntrypoint.available
backends.PydapBackendEntrypoint.description
backends.PydapBackendEntrypoint.url
backends.PydapBackendEntrypoint.guess_can_open
backends.PydapBackendEntrypoint.open_dataset

backends.ScipyDataStore.close
backends.ScipyDataStore.encode
backends.ScipyDataStore.encode_attribute
Expand All @@ -541,6 +574,39 @@
backends.ScipyDataStore.sync
backends.ScipyDataStore.ds

backends.ScipyBackendEntrypoint.available
backends.ScipyBackendEntrypoint.description
backends.ScipyBackendEntrypoint.url
backends.ScipyBackendEntrypoint.guess_can_open
backends.ScipyBackendEntrypoint.open_dataset

backends.ZarrStore.close
backends.ZarrStore.encode_attribute
backends.ZarrStore.encode_variable
backends.ZarrStore.get_attrs
backends.ZarrStore.get_dimensions
backends.ZarrStore.get_variables
backends.ZarrStore.open_group
backends.ZarrStore.open_store_variable
backends.ZarrStore.set_attributes
backends.ZarrStore.set_dimensions
backends.ZarrStore.set_variables
backends.ZarrStore.store
backends.ZarrStore.sync
backends.ZarrStore.ds

backends.ZarrBackendEntrypoint.available
backends.ZarrBackendEntrypoint.description
backends.ZarrBackendEntrypoint.url
backends.ZarrBackendEntrypoint.guess_can_open
backends.ZarrBackendEntrypoint.open_dataset

backends.StoreBackendEntrypoint.available
backends.StoreBackendEntrypoint.description
backends.StoreBackendEntrypoint.url
backends.StoreBackendEntrypoint.guess_can_open
backends.StoreBackendEntrypoint.open_dataset

backends.FileManager.acquire
backends.FileManager.acquire_context
backends.FileManager.close
Expand Down
16 changes: 16 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1109,12 +1109,28 @@ arguments for the ``load_store`` and ``dump_to_store`` Dataset methods:

backends.NetCDF4DataStore
backends.H5NetCDFStore
backends.PseudoNetCDFDataStore
backends.PydapDataStore
backends.ScipyDataStore
backends.ZarrStore
backends.FileManager
backends.CachingFileManager
backends.DummyFileManager

These BackendEntrypoints provide a basic interface to the most commonly
used filetypes in the xarray universe.

.. autosummary::
:toctree: generated/

backends.NetCDF4BackendEntrypoint
backends.H5netcdfBackendEntrypoint
backends.PseudoNetCDFBackendEntrypoint
backends.PydapBackendEntrypoint
backends.ScipyBackendEntrypoint
backends.StoreBackendEntrypoint
backends.ZarrBackendEntrypoint

Deprecated / Pending Deprecation
================================

Expand Down
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ Documentation
By `Tom Nicholas <https://github.com/TomNicholas>`_.
- Rename ``axes`` to ``axs`` in plotting to align with ``matplotlib.pyplot.subplots``. (:pull:`7194`)
By `Jimmy Westling <https://github.com/illviljan>`_.
- Add documentation of specific BackendEntrypoints (:pull:`7200`).
By `Michael Niklas <https://github.com/headtr1ck>`_.

Internal Changes
~~~~~~~~~~~~~~~~
Expand Down
20 changes: 14 additions & 6 deletions xarray/backends/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,16 @@
from .cfgrib_ import CfGribDataStore
from .common import AbstractDataStore, BackendArray, BackendEntrypoint
from .file_manager import CachingFileManager, DummyFileManager, FileManager
from .h5netcdf_ import H5NetCDFStore
from .h5netcdf_ import H5netcdfBackendEntrypoint, H5NetCDFStore
from .memory import InMemoryDataStore
from .netCDF4_ import NetCDF4DataStore
from .netCDF4_ import NetCDF4BackendEntrypoint, NetCDF4DataStore
from .plugins import list_engines
from .pseudonetcdf_ import PseudoNetCDFDataStore
from .pydap_ import PydapDataStore
from .pseudonetcdf_ import PseudoNetCDFBackendEntrypoint, PseudoNetCDFDataStore
from .pydap_ import PydapBackendEntrypoint, PydapDataStore
from .pynio_ import NioDataStore
from .scipy_ import ScipyDataStore
from .zarr import ZarrStore
from .scipy_ import ScipyBackendEntrypoint, ScipyDataStore
from .store import StoreBackendEntrypoint
from .zarr import ZarrBackendEntrypoint, ZarrStore

__all__ = [
"AbstractDataStore",
Expand All @@ -32,5 +33,12 @@
"H5NetCDFStore",
"ZarrStore",
"PseudoNetCDFDataStore",
"H5netcdfBackendEntrypoint",
"NetCDF4BackendEntrypoint",
"PseudoNetCDFBackendEntrypoint",
"PydapBackendEntrypoint",
"ScipyBackendEntrypoint",
"StoreBackendEntrypoint",
"ZarrBackendEntrypoint",
"list_engines",
]
25 changes: 25 additions & 0 deletions xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,32 @@ def close(self, **kwargs):


class H5netcdfBackendEntrypoint(BackendEntrypoint):
"""
Backend for netCDF files based on the h5netcdf package.

It can open ".nc", ".nc4", ".cdf" files but will only be
selected as the default if the "netcdf4" engine is not available.

Additionally it can open valid HDF5 files, see
https://h5netcdf.org/#invalid-netcdf-files for more info.
It will not be detected as valid backend for such files, so make
sure to specify ``engine="h5netcdf"`` in ``open_dataset``.

For more information about the underlying library, visit:
https://h5netcdf.org

See Also
--------
backends.H5NetCDFStore
backends.NetCDF4BackendEntrypoint
backends.ScipyBackendEntrypoint
"""

available = has_h5netcdf
description = (
"Open netCDF (.nc, .nc4 and .cdf) and most HDF5 files using h5netcdf in Xarray"
)
url = "https://docs.xarray.dev/en/stable/generated/xarray.backends.H5netcdfBackendEntrypoint.html"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. I wonder whether these should be in docstrings then be linked by Sphinx, vs. in code. But ofc fine to try from my perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment here, h5netcdf (and netcdf4) can also open hdf5 files which are not strict NetCDF4. Not sure how to better phrase this, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. I wonder whether these should be in docstrings then be linked by Sphinx, vs. in code. But ofc fine to try from my perspective.

No idea how that would work, could you give an example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment here, h5netcdf (and netcdf4) can also open hdf5 files which are not strict NetCDF4. Not sure how to better phrase this, though.

Should ".h5" be added to guess_can_open?
Or you don't want to specify this because not strict netCDF files might not work with xarray?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ".h5" be added to guess_can_open?

I'd say that this is not necessary. Those who want to open will use the engine-kwarg.

There are only a few hdf5-specifics which can't be read by netcdf4 (see https://h5netcdf.org/#invalid-netcdf-files, list might not be up-to-date as netcdf-c is evolving too). So xarray is able digest most normal hdf5-files (via netcdf4 and h5netcdf).

My concern is, that users might conclude that only netCDF4-files are readable and get confused. Good chance that my impression will not hold and users know what they are doing.

Anyway, I really like these enhancements to the backends.


def guess_can_open(self, filename_or_obj):
magic_number = try_read_magic_number_from_file_or_path(filename_or_obj)
Expand Down
25 changes: 25 additions & 0 deletions xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,32 @@ def close(self, **kwargs):


class NetCDF4BackendEntrypoint(BackendEntrypoint):
"""
Backend for netCDF files based on the netCDF4 package.

It can open ".nc", ".nc4", ".cdf" files and will be choosen
as default for these files.

Additionally it can open valid HDF5 files, see
https://h5netcdf.org/#invalid-netcdf-files for more info.
It will not be detected as valid backend for such files, so make
sure to specify ``engine="netcdf4"`` in ``open_dataset``.

For more information about the underlying library, visit:
https://unidata.github.io/netcdf4-python

See Also
--------
backends.NetCDF4DataStore
backends.H5netcdfBackendEntrypoint
backends.ScipyBackendEntrypoint
"""

available = has_netcdf4
description = (
"Open netCDF (.nc, .nc4 and .cdf) and most HDF5 files using netCDF4 in Xarray"
)
url = "https://docs.xarray.dev/en/stable/generated/xarray.backends.NetCDF4BackendEntrypoint.html"

def guess_can_open(self, filename_or_obj):
if isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj):
Expand Down
28 changes: 28 additions & 0 deletions xarray/backends/pseudonetcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,35 @@ def close(self):


class PseudoNetCDFBackendEntrypoint(BackendEntrypoint):
"""
Backend for netCDF-like data formats in the air quality field
based on the PseudoNetCDF package.

It can open:
- CAMx
- RACM2 box-model outputs
- Kinetic Pre-Processor outputs
- ICARTT Data files (ffi1001)
- CMAQ Files
- GEOS-Chem Binary Punch/NetCDF files
- and many more

This backend is not selected by default for any files, so make
sure to specify ``engine="pseudonetcdf"`` in ``open_dataset``.

For more information about the underlying library, visit:
https://pseudonetcdf.readthedocs.io

See Also
--------
backends.PseudoNetCDFDataStore
"""

available = has_pseudonetcdf
description = (
"Open many atmospheric science data formats using PseudoNetCDF in Xarray"
)
url = "https://docs.xarray.dev/en/stable/generated/xarray.backends.PseudoNetCDFBackendEntrypoint.html"

# *args and **kwargs are not allowed in open_backend_dataset_ kwargs,
# unless the open_dataset_parameters are explicitly defined like this:
Expand Down
17 changes: 17 additions & 0 deletions xarray/backends/pydap_.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,24 @@ def get_dimensions(self):


class PydapBackendEntrypoint(BackendEntrypoint):
"""
Backend for steaming datasets over the internet using
the Data Access Protocol, also known as DODS or OPeNDAP
based on the pydap package.

This backend is selected by default for urls.

For more information about the underlying library, visit:
https://www.pydap.org

See Also
--------
backends.PydapDataStore
"""

available = has_pydap
description = "Open remote datasets via OPeNDAP using pydap in Xarray"
url = "https://docs.xarray.dev/en/stable/generated/xarray.backends.PydapBackendEntrypoint.html"

def guess_can_open(self, filename_or_obj):
return isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj)
Expand Down
22 changes: 22 additions & 0 deletions xarray/backends/scipy_.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,29 @@ def close(self):


class ScipyBackendEntrypoint(BackendEntrypoint):
"""
Backend for netCDF files based on the scipy package.

It can open ".nc", ".nc4", ".cdf" and ".gz" files but will only be
selected as the default if the "netcdf4" and "h5netcdf" engines are
not available. It has the advantage that is is a lightweight engine
that has no system requirements (unlike netcdf4 and h5netcdf).

Additionally it can open gizp compressed (".gz") files.

For more information about the underlying library, visit:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.netcdf_file.html

See Also
--------
backends.ScipyDataStore
backends.NetCDF4BackendEntrypoint
backends.H5netcdfBackendEntrypoint
"""

available = has_scipy
description = "Open netCDF files (.nc, .nc4, .cdf and .gz) using scipy in Xarray"
url = "https://docs.xarray.dev/en/stable/generated/xarray.backends.ScipyBackendEntrypoint.html"

def guess_can_open(self, filename_or_obj):

Expand Down
2 changes: 2 additions & 0 deletions xarray/backends/store.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@

class StoreBackendEntrypoint(BackendEntrypoint):
available = True
description = "Open AbstractDataStore instances in Xarray"
url = "https://docs.xarray.dev/en/stable/generated/xarray.backends.StoreBackendEntrypoint.html"

def guess_can_open(self, filename_or_obj):
return isinstance(filename_or_obj, AbstractDataStore)
Expand Down
13 changes: 13 additions & 0 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -808,7 +808,20 @@ def open_zarr(


class ZarrBackendEntrypoint(BackendEntrypoint):
"""
Backend for ".zarr" files based on the zarr package.

For more information about the underlying library, visit:
https://zarr.readthedocs.io/en/stable

See Also
--------
backends.ZarrStore
"""

available = has_zarr
description = "Open zarr files (.zarr) using zarr in Xarray"
url = "https://docs.xarray.dev/en/stable/generated/xarray.backends.ZarrBackendEntrypoint.html"

def guess_can_open(self, filename_or_obj):
try:
Expand Down