Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs page on interoperability #7992

Merged
merged 32 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
198f67b
add page on internal design
TomNicholas Jul 17, 2023
9fe6635
add xarray-datatree to intersphinx mapping
TomNicholas Jul 17, 2023
c231f58
typo
TomNicholas Jul 17, 2023
957675d
add subheadings to the accessors page
TomNicholas Jul 17, 2023
9fd2fc5
Revert "add page on internal design"
TomNicholas Jul 17, 2023
04fcab2
rename page on variables
TomNicholas Jul 17, 2023
40fc8c5
whatsnew
TomNicholas Jul 17, 2023
011ab25
page on interoperability
TomNicholas Jul 17, 2023
869363a
add interoperability page to index
TomNicholas Jul 17, 2023
3abe029
fix whatsnew
TomNicholas Jul 17, 2023
39437fb
Merge branch 'main' into docs_internal_design
TomNicholas Jul 17, 2023
2ab30d6
Merge branch 'main' into docs_interoperability
TomNicholas Jul 17, 2023
e688526
Merge branch 'main' into docs_interoperability
TomNicholas Jul 17, 2023
3809d50
sel->isel
TomNicholas Jul 17, 2023
1e98361
Merge branch 'docs_internal_design' of https://github.com/TomNicholas…
TomNicholas Jul 17, 2023
3ad0722
Merge branch 'docs_internal_design' into docs_interoperability
TomNicholas Jul 17, 2023
4afea37
add section on lazy indexing
TomNicholas Jul 18, 2023
84e9aa2
actually show lazy indexing example
TomNicholas Jul 18, 2023
0e0a240
Merge branch 'main' into docs_internal_design
TomNicholas Jul 18, 2023
4e98b58
Merge branch 'docs_internal_design' into docs_interoperability
TomNicholas Jul 18, 2023
1c2a5b7
link to custom indexes page
TomNicholas Jul 18, 2023
c8b2653
fix some formatting
TomNicholas Jul 18, 2023
2964c60
put encoding last
TomNicholas Jul 18, 2023
4e41c42
Merge branch 'main' into docs_interoperability
TomNicholas Jul 25, 2023
6e1240f
attrs and encoding are not ordered dicts
TomNicholas Jul 25, 2023
8bc561a
Merge branch 'main' into docs_interoperability
TomNicholas Sep 13, 2023
40e799a
Merge branch 'main' into docs_interoperability
TomNicholas Oct 4, 2023
b2b8338
reword lack of support for subclassing
TomNicholas Oct 4, 2023
1f9c17c
Merge branch 'main' into docs_interoperability
TomNicholas Oct 4, 2023
a4a72cd
remove duplicate word
TomNicholas Oct 4, 2023
bc0d55d
encourage contributions to supporting subclassing
TomNicholas Oct 4, 2023
ae4619b
Merge branch 'main' into docs_interoperability
TomNicholas Oct 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/internals/how-to-create-custom-index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
.. currentmodule:: xarray

.. _internals.custom indexes:

How to create a custom index
============================

Expand Down
3 changes: 2 additions & 1 deletion doc/internals/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@ The pages in this section are intended for:
:hidden:

internal-design
interoperability
duck-arrays-integration
chunked-arrays
extending-xarray
zarr-encoding-spec
how-to-add-new-backend
how-to-create-custom-index
zarr-encoding-spec
8 changes: 4 additions & 4 deletions doc/internals/internal-design.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,9 @@ which is used as the basic building block behind xarray's
- ``data``: The N-dimensional array (typically a NumPy or Dask array) storing
the Variable's data. It must have the same number of dimensions as the length
of ``dims``.
- ``attrs``: An ordered dictionary of metadata associated with this array. By
- ``attrs``: A dictionary of metadata associated with this array. By
convention, xarray's built-in operations never use this metadata.
- ``encoding``: Another ordered dictionary used to store information about how
- ``encoding``: Another dictionary used to store information about how
these variable's data is represented on disk. See :ref:`io.encoding` for more
details.

Expand Down Expand Up @@ -95,7 +95,7 @@ all of which are implemented by forwarding on to the underlying ``Variable`` obj

In addition, a :py:class:`~xarray.DataArray` stores additional ``Variable`` objects stored in a dict under the private ``_coords`` attribute,
each of which is referred to as a "Coordinate Variable". These coordinate variable objects are only allowed to have ``dims`` that are a subset of the data variable's ``dims``,
and each dim has a specific length. This means that the full :py:attr:`~xarray.DataArray.sizes` of the dataarray can be represented by a dictionary mapping dimension names to integer sizes.
and each dim has a specific length. This means that the full :py:attr:`~xarray.DataArray.size` of the dataarray can be represented by a dictionary mapping dimension names to integer sizes.
The underlying data variable has this exact same size, and the attached coordinate variables have sizes which are some subset of the size of the data variable.
Another way of saying this is that all coordinate variables must be "alignable" with the data variable.

Expand Down Expand Up @@ -124,7 +124,7 @@ The :py:class:`~xarray.Dataset` class is a generalization of the :py:class:`~xar
Internally all data variables and coordinate variables are stored under a single ``variables`` dict, and coordinates are
specified by storing their names in a private ``_coord_names`` dict.

The dataset's dimensions are the set of all dims present across any variable, but (similar to in dataarrays) coordinate
The dataset's ``dims`` are the set of all dims present across any variable, but (similar to in dataarrays) coordinate
variables cannot have a dimension that is not present on any data variable.

When a data variable or coordinate variable is accessed, a new ``DataArray`` is again constructed from all compatible
Expand Down
45 changes: 45 additions & 0 deletions doc/internals/interoperability.rst
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool page!

Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
.. _interoperability:

Interoperability of Xarray
==========================

Xarray is designed to be extremely interoperable, in many orthogonal ways.
Making xarray as flexible as possible is the common theme of most of the goals on our :ref:`roadmap`.

This interoperability comes via a set of flexible abstractions into which the user can plug in. The current full list is:

- :ref:`Custom file backends <add_a_backend>` via the :py:class:`~xarray.backends.BackendEntrypoint` system,
- Numpy-like :ref:`"duck" array wrapping <internals.duckarrays>`, which supports the `Python Array API Standard <https://data-apis.org/array-api/latest/>`_,
- :ref:`Chunked distributed array computation <internals.chunkedarrays>` via the :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` system,
- Custom :py:class:`~xarray.Index` objects for :ref:`flexible label-based lookups <internals.custom indexes>`,
- Extending xarray objects with domain-specific methods via :ref:`custom accessors <internals.accessors>`.

.. warning::

One obvious way in which xarray could be more flexible is that whilst subclassing xarray objects is possible, we
currently don't support it in most transformations, instead recommending composition over inheritance. See the
:ref:`internal design page <internal design.subclassing>` for the rationale and look at the corresponding `GH issue <https://github.com/pydata/xarray/issues/3980>`_
if you're interested in improving support for subclassing!

.. note::

If you think there is another way in which xarray could become more generically flexible then please
tell us your ideas by `raising an issue to request the feature <https://github.com/pydata/xarray/issues/new/choose>`_!


Whilst xarray was originally designed specifically to open ``netCDF4`` files as :py:class:`numpy.ndarray` objects labelled by :py:class:`pandas.Index` objects,
it is entirely possible today to:

- lazily open an xarray object directly from a custom binary file format (e.g. using ``xarray.open_dataset(path, engine='my_custom_format')``,
- handle the data as any API-compliant numpy-like array type (e.g. sparse or GPU-backed),
- distribute out-of-core computation across that array type in parallel (e.g. via :ref:`dask`),
- track the physical units of the data through computations (e.g via `pint-xarray <https://pint-xarray.readthedocs.io/en/stable/>`_),
- query the data via custom index logic optimized for specific applications (e.g. an :py:class:`~xarray.Index` object backed by a KDTree structure),
- attach domain-specific logic via accessor methods (e.g. to understand geographic Coordinate Reference System metadata),
- organize hierarchical groups of xarray data in a :py:class:`~datatree.DataTree` (e.g. to treat heterogenous simulation and observational data together during analysis).

All of these features can be provided simultaneously, using libaries compatible with the rest of the scientific python ecosystem.
In this situation xarray would be essentially a thin wrapper acting as pure-python framework, providing a common interface and
separation of concerns via various domain-agnostic abstractions.

Most of the remaining pages in the documentation of xarray's internals describe these various types of interoperability in more detail.
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,8 @@ Bug fixes
Documentation
~~~~~~~~~~~~~

- Added page on the interoperability of xarray objects.
(:pull:`7992`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
- Added xarray-regrid to the list of xarray related projects (:pull:`8272`).
By `Bart Schilperoort <https://github.com/BSchilperoort>`_.

Expand Down
Loading