Skip to content

Commit 4a945ec

Browse files
Iris ❤ Xarray docs page. (#5025)
* Iris Xarray docs page. * Add links. * Xarray page styling. * What's New entry. * Minor docs fixes. * Overall experience section. * Xarray supports other plotting backends through external packages. Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> * Section on converting between Iris and Xarray. * Clearer language around laziness and multi-processing. * To-do note about dates and fill values. * Move iris_xarray page into a new Community section. * Language fixes from @bjlittle review. Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
1 parent a3b3560 commit 4a945ec

File tree

12 files changed

+244
-18
lines changed

12 files changed

+244
-18
lines changed

docs/src/common_links.inc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
.. _CF-UGRID: https://ugrid-conventions.github.io/ugrid-conventions/
4141
.. _issues on GitHub: https://github.com/SciTools/iris/issues?q=is%3Aopen+is%3Aissue+sort%3Areactions-%2B1-desc
4242
.. _python-stratify: https://github.com/SciTools/python-stratify
43+
.. _iris-esmf-regrid: https://github.com/SciTools-incubator/iris-esmf-regrid
4344

4445

4546
.. comment

docs/src/community/index.rst

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
.. include:: ../common_links.inc
2+
3+
.. todo:
4+
consider scientific-python.org
5+
consider scientific-python.org/specs/
6+
7+
Iris in the Community
8+
=====================
9+
10+
Iris aims to be a valuable member of the open source scientific Python
11+
community.
12+
13+
We listen out for developments in our dependencies and neighbouring projects,
14+
and we reach out to them when we can solve problems together; please feel free
15+
to reach out to us!
16+
17+
We are aware of our place in the user's wider 'toolbox' - offering unique
18+
functionality and interoperating smoothly with other packages.
19+
20+
We welcome contributions from all; whether that's an opinion, a 1-line
21+
clarification, or a whole new feature 🙂
22+
23+
Quick Links
24+
-----------
25+
26+
* `GitHub Discussions`_
27+
* :ref:`Getting involved<development_where_to_start>`
28+
* `Twitter <https://twitter.com/scitools_iris>`_
29+
30+
Interoperability
31+
----------------
32+
33+
There's a big choice of Python tools out there! Each one has strengths and
34+
weaknesses in different areas, so we don't want to force a single choice for your
35+
whole workflow - we'd much rather make it easy for you to choose the right tool
36+
for the moment, switching whenever you need. Below are our ongoing efforts at
37+
smoother interoperability:
38+
39+
.. not using toctree due to combination of child pages and cross-references.
40+
41+
* The :mod:`iris.pandas` module
42+
* :doc:`iris_xarray`
43+
44+
.. toctree::
45+
:maxdepth: 1
46+
:hidden:
47+
48+
iris_xarray

docs/src/community/iris_xarray.rst

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
.. include:: ../common_links.inc
2+
3+
======================
4+
Iris ❤️ :term:`Xarray`
5+
======================
6+
7+
There is a lot of overlap between Iris and :term:`Xarray`, but some important
8+
differences too. Below is a summary of the most important differences, so that
9+
you can be prepared, and to help you choose the best package for your use case.
10+
11+
Overall Experience
12+
------------------
13+
14+
Iris is the more specialised package, focussed on making it as easy
15+
as possible to work with meteorological and climatological data. Iris
16+
is built to natively handle many key concepts, such as the CF conventions,
17+
coordinate systems and bounded coordinates. Iris offers a smaller toolkit of
18+
operations compared to Xarray, particularly around API for sophisticated
19+
computation such as array manipulation and multi-processing.
20+
21+
Xarray's more generic data model and community-driven development give it a
22+
richer range of operations and broader possible uses. Using Xarray
23+
specifically for meteorology/climatology may require deeper knowledge
24+
compared to using Iris, and you may prefer to add Xarray plugins
25+
such as :ref:`cfxarray` to get the best experience. Advanced users can likely
26+
achieve better performance with Xarray than with Iris.
27+
28+
Conversion
29+
----------
30+
There are multiple ways to convert between Iris and Xarray objects.
31+
32+
* Xarray includes the :meth:`~xarray.DataArray.to_iris` and
33+
:meth:`~xarray.DataArray.from_iris` methods - detailed in the
34+
`Xarray IO notes on Iris`_. Since Iris evolves independently of Xarray, be
35+
vigilant for concepts that may be lost during the conversion.
36+
* Because both packages are closely linked to the :term:`NetCDF Format`, it is
37+
feasible to save a NetCDF file using one package then load that file using
38+
the other package. This will be lossy in places, as both Iris and Xarray
39+
are opinionated on how certain NetCDF concepts relate to their data models.
40+
* The Iris development team are exploring an improved 'bridge' between the two
41+
packages. Follow the conversation on GitHub: `iris#4994`_. This project is
42+
expressly intended to be as lossless as possible.
43+
44+
Regridding
45+
----------
46+
Iris and Xarray offer a range of regridding methods - both natively and via
47+
additional packages such as `iris-esmf-regrid`_ and `xESMF`_ - which overlap
48+
in places
49+
but tend to cover a different set of use cases (e.g. Iris handles unstructured
50+
meshes but offers access to fewer ESMF methods). The behaviour of these
51+
regridders also differs slightly (even between different regridders attached to
52+
the same package) so the appropriate package to use depends highly on the
53+
particulars of the use case.
54+
55+
Plotting
56+
--------
57+
Xarray and Iris have a large overlap of functionality when creating
58+
:term:`Matplotlib` plots and both support the plotting of multidimensional
59+
coordinates. This means the experience is largely similar using either package.
60+
61+
Xarray supports further plotting backends through external packages (e.g. Bokeh through `hvPlot`_)
62+
and, if a user is already familiar with `pandas`_, the interface should be
63+
familiar. It also supports some different plot types to Iris, and therefore can
64+
be used for a wider variety of plots. It also has benefits regarding "out of
65+
the box", quick customisations to plots. However, if further customisation is
66+
required, knowledge of matplotlib is still required.
67+
68+
In both cases, :term:`Cartopy` is/can be used. Iris does more work
69+
automatically for the user here, creating Cartopy
70+
:class:`~cartopy.mpl.geoaxes.GeoAxes` for latitude and longitude coordinates,
71+
whereas the user has to do this manually in Xarray.
72+
73+
Statistics
74+
----------
75+
Both libraries are quite comparable with generally similar capabilities,
76+
performance and laziness. Iris offers more specificity in some cases, such as
77+
some more specific unique functions and masked tolerance in most statistics.
78+
Xarray seems more approachable however, with some less unique but more
79+
convenient solutions (these tend to be wrappers to :term:`Dask` functions).
80+
81+
Laziness and Multi-Processing with :term:`Dask`
82+
-----------------------------------------------
83+
Iris and Xarray both support lazy data and out-of-core processing through
84+
utilisation of Dask.
85+
86+
While both Iris and Xarray expose :term:`NumPy` conveniences at the API level
87+
(e.g. the `ndim()` method), only Xarray exposes Dask conveniences. For example
88+
:attr:`xarray.DataArray.chunks`, which gives the user direct control
89+
over the underlying Dask array chunks. The Iris API instead takes control of
90+
such concepts and user control is only possible by manipulating the underlying
91+
Dask array directly (accessed via :meth:`iris.cube.Cube.core_data`).
92+
93+
:class:`xarray.DataArray`\ s comply with `NEP-18`_, allowing NumPy arrays to be
94+
based on them, and they also include the necessary extra members for Dask
95+
arrays to be based on them too. Neither of these is currently possible with
96+
Iris :class:`~iris.cube.Cube`\ s, although an ambition for the future.
97+
98+
NetCDF File Control
99+
-------------------
100+
(More info: :term:`NetCDF Format`)
101+
102+
Unlike Iris, Xarray generally provides full control of major file structures,
103+
i.e. dimensions + variables, including their order in the file. It mostly
104+
respects these in a file input, and can reproduce them on output.
105+
However, attribute handling is not so complete: like Iris, it interprets and
106+
modifies some recognised aspects, and can add some extra attributes not in the
107+
input.
108+
109+
.. todo:
110+
More detail on dates and fill values (@pp-mo suggestion).
111+
112+
Handling of dates and fill values have some special problems here.
113+
114+
Ultimately, nearly everything wanted in a particular desired result file can
115+
be achieved in Xarray, via provided override mechanisms (`loading keywords`_
116+
and the '`encoding`_' dictionaries).
117+
118+
Missing Data
119+
------------
120+
Xarray uses :data:`numpy.nan` to represent missing values and this will support
121+
many simple use cases assuming the data are floats. Iris enables more
122+
sophisticated missing data handling by representing missing values as masks
123+
(:class:`numpy.ma.MaskedArray` for real data and :class:`dask.array.Array`
124+
for lazy data) which allows data to be any data type and to include either/both
125+
a mask and :data:`~numpy.nan`\ s.
126+
127+
.. _cfxarray:
128+
129+
`cf-xarray`_
130+
-------------
131+
Iris has a data model entirely based on :term:`CF Conventions`. Xarray has a
132+
data model based on :term:`NetCDF Format` with cf-xarray acting as translation
133+
into CF. Xarray/cf-xarray methods can be
134+
called and data accessed with CF like arguments (e.g. axis, standard name) and
135+
there are some CF specific utilities (similar
136+
to Iris utilities). Iris tends to cover more of and be stricter about CF.
137+
138+
139+
.. seealso::
140+
141+
* `Xarray IO notes on Iris`_
142+
* `Xarray notes on other NetCDF libraries`_
143+
144+
.. _Xarray IO notes on Iris: https://docs.xarray.dev/en/stable/user-guide/io.html#iris
145+
.. _Xarray notes on other NetCDF libraries: https://docs.xarray.dev/en/stable/getting-started-guide/faq.html#what-other-netcdf-related-python-libraries-should-i-know-about
146+
.. _loading keywords: https://docs.xarray.dev/en/stable/generated/xarray.open_dataset.html#xarray.open_dataset
147+
.. _encoding: https://docs.xarray.dev/en/stable/user-guide/io.html#writing-encoded-data
148+
.. _xESMF: https://github.com/pangeo-data/xESMF/
149+
.. _seaborn: https://seaborn.pydata.org/
150+
.. _hvPlot: https://hvplot.holoviz.org/
151+
.. _pandas: https://pandas.pydata.org/
152+
.. _NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html
153+
.. _cf-xarray: https://github.com/xarray-contrib/cf-xarray
154+
.. _iris#4994: https://github.com/SciTools/iris/issues/4994

docs/src/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,7 @@ def _dotv(version):
223223
"python": ("https://docs.python.org/3/", None),
224224
"scipy": ("https://docs.scipy.org/doc/scipy/", None),
225225
"pandas": ("https://pandas.pydata.org/docs/", None),
226+
"dask": ("https://docs.dask.org/en/stable/", None),
226227
}
227228

228229
# The name of the Pygments (syntax highlighting) style to use.

docs/src/further_topics/ugrid/partner_packages.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. include:: ../../common_links.inc
2+
13
.. _ugrid partners:
24

35
Iris' Mesh Partner Packages
@@ -97,4 +99,3 @@ Applications
9799
98100
.. _GeoVista: https://github.com/bjlittle/geovista
99101
.. _PyVista: https://docs.pyvista.org/index.html
100-
.. _iris-esmf-regrid: https://github.com/SciTools-incubator/iris-esmf-regrid

docs/src/index.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,15 @@ The legacy support resources:
136136
developers_guide/contributing_getting_involved
137137

138138

139+
.. toctree::
140+
:caption: Community
141+
:maxdepth: 1
142+
:name: community_index
143+
:hidden:
144+
145+
Community <community/index>
146+
147+
139148
.. toctree::
140149
:caption: Iris API
141150
:maxdepth: 1

docs/src/whatsnew/latest.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,14 @@ This document explains the changes made to Iris for this release
8282
and removed an ECMWF link in the ``v1.0`` What's New that was failing the
8383
linkcheck CI. (:pull:`5109`)
8484

85+
#. `@trexfeathers`_ added a new top-level :doc:`/community/index` section,
86+
as a one-stop place to find out about getting involved, and how we relate
87+
to other projects. (:pull:`5025`)
88+
89+
#. The **Iris community**, with help from the **Xarray community**, produced
90+
the :doc:`/community/iris_xarray` page, highlighting the similarities and
91+
differences between the two packages. (:pull:`5025`)
92+
8593
💼 Internal
8694
===========
8795

lib/iris/_lazy_data.py

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ def is_lazy_data(data):
3939
"""
4040
Return whether the argument is an Iris 'lazy' data array.
4141
42-
At present, this means simply a Dask array.
42+
At present, this means simply a :class:`dask.array.Array`.
4343
We determine this by checking for a "compute" property.
4444
4545
"""
@@ -67,7 +67,8 @@ def _optimum_chunksize_internals(
6767
* shape (tuple of int):
6868
The full array shape of the target data.
6969
* limit (int):
70-
The 'ideal' target chunk size, in bytes. Default from dask.config.
70+
The 'ideal' target chunk size, in bytes. Default from
71+
:mod:`dask.config`.
7172
* dtype (np.dtype):
7273
Numpy dtype of target data.
7374
@@ -77,7 +78,7 @@ def _optimum_chunksize_internals(
7778
7879
.. note::
7980
The purpose of this is very similar to
80-
`dask.array.core.normalize_chunks`, when called as
81+
:func:`dask.array.core.normalize_chunks`, when called as
8182
`(chunks='auto', shape, dtype=dtype, previous_chunks=chunks, ...)`.
8283
Except, the operation here is optimised specifically for a 'c-like'
8384
dimension order, i.e. outer dimensions first, as for netcdf variables.
@@ -174,13 +175,13 @@ def _optimum_chunksize(
174175

175176
def as_lazy_data(data, chunks=None, asarray=False):
176177
"""
177-
Convert the input array `data` to a dask array.
178+
Convert the input array `data` to a :class:`dask.array.Array`.
178179
179180
Args:
180181
181182
* data (array-like):
182183
An indexable object with 'shape', 'dtype' and 'ndim' properties.
183-
This will be converted to a dask array.
184+
This will be converted to a :class:`dask.array.Array`.
184185
185186
Kwargs:
186187
@@ -192,7 +193,7 @@ def as_lazy_data(data, chunks=None, asarray=False):
192193
Set to False (default) to pass passed chunks through unchanged.
193194
194195
Returns:
195-
The input array converted to a dask array.
196+
The input array converted to a :class:`dask.array.Array`.
196197
197198
.. note::
198199
The result chunk size is a multiple of 'chunks', if given, up to the
@@ -284,15 +285,16 @@ def multidim_lazy_stack(stack):
284285
"""
285286
Recursively build a multidimensional stacked dask array.
286287
287-
This is needed because dask.array.stack only accepts a 1-dimensional list.
288+
This is needed because :meth:`dask.array.Array.stack` only accepts a
289+
1-dimensional list.
288290
289291
Args:
290292
291293
* stack:
292-
An ndarray of dask arrays.
294+
An ndarray of :class:`dask.array.Array`.
293295
294296
Returns:
295-
The input array converted to a lazy dask array.
297+
The input array converted to a lazy :class:`dask.array.Array`.
296298
297299
"""
298300
if stack.ndim == 0:

lib/iris/cube.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -884,7 +884,8 @@ def __init__(
884884
This object defines the shape of the cube and the phenomenon
885885
value in each cell.
886886
887-
``data`` can be a dask array, a NumPy array, a NumPy array
887+
``data`` can be a :class:`dask.array.Array`, a
888+
:class:`numpy.ndarray`, a NumPy array
888889
subclass (such as :class:`numpy.ma.MaskedArray`), or
889890
array_like (as described in :func:`numpy.asarray`).
890891

lib/iris/experimental/ugrid/mesh.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ def __init__(
131131
132132
Args:
133133
134-
* indices (numpy.ndarray or numpy.ma.core.MaskedArray or dask.array.Array):
134+
* indices (:class:`numpy.ndarray` or :class:`numpy.ma.core.MaskedArray` or :class:`dask.array.Array`):
135135
2D array giving the topological connection relationship between
136136
:attr:`location` elements and :attr:`connected` elements.
137137
The :attr:`location_axis` dimension indexes over the
@@ -501,7 +501,7 @@ def core_indices(self):
501501
NumPy array or a Dask array.
502502
503503
Returns:
504-
numpy.ndarray or numpy.ma.core.MaskedArray or dask.array.Array
504+
:class:`numpy.ndarray` or :class:`numpy.ma.core.MaskedArray` or :class:`dask.array.Array`
505505
506506
"""
507507
return super()._core_values()

0 commit comments

Comments
 (0)