You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Re-implement map_over_datasets using group_subtrees (#9636)
* Add zip_subtrees for paired iteration over DataTrees
This should be used for implementing DataTree arithmetic inside
map_over_datasets, so the result does not depend on the order in which
child nodes are defined.
I have also added a minimal implementation of breadth-first-search with
an explicit queue the current recursion based solution in
xarray.core.iterators (which has been removed). The new implementation
is also slightly faster in my microbenchmark:
In [1]: import xarray as xr
In [2]: tree = xr.DataTree.from_dict({f"/x{i}": None for i in range(100)})
In [3]: %timeit _ = list(tree.subtree)
# on main
87.2 μs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# with this branch
55.1 μs ± 294 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
* fix pytype error
* Re-implement map_over_datasets
The main changes:
- It is implemented using zip_subtrees, which means it should properly
handle DataTrees where the nodes are defined in a different order.
- For simplicity, I removed handling of `**kwargs`, in order to preserve
some flexibility for adding keyword arugments.
- I removed automatic skipping of empty nodes, because there are almost
assuredly cases where that would make sense. This could be restored
with a option keyword arugment.
* fix typing of map_over_datasets
* add group_subtrees
* wip fixes
* update isomorphic
* documentation and API change for map_over_datasets
* mypy fixes
* fix test
* diff formatting
* more mypy
* doc fix
* more doc fix
* add api docs
* add utility for joining path on windows
* docstring
* add an overload for two return values from map_over_datasets
* partial fixes per review
* fixes per review
* remove a couple of xfails
Copy file name to clipboardExpand all lines: DATATREE_MIGRATION_GUIDE.md
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ This guide is for previous users of the prototype `datatree.DataTree` class in t
7
7
> [!IMPORTANT]
8
8
> There are breaking changes! You should not expect that code written with `xarray-contrib/datatree` will work without any modifications. At the absolute minimum you will need to change the top-level import statement, but there are other changes too.
9
9
10
-
We have made various changes compared to the prototype version. These can be split into three categories: data model changes, which affect the hierarchal structure itself, integration with xarray's IO backends; and minor API changes, which mostly consist of renaming methods to be more self-consistent.
10
+
We have made various changes compared to the prototype version. These can be split into three categories: data model changes, which affect the hierarchal structure itself; integration with xarray's IO backends; and minor API changes, which mostly consist of renaming methods to be more self-consistent.
11
11
12
12
### Data model changes
13
13
@@ -17,6 +17,8 @@ These alignment checks happen at tree construction time, meaning there are some
17
17
18
18
The alignment checks allowed us to add "Coordinate Inheritance", a much-requested feature where indexed coordinate variables are now "inherited" down to child nodes. This allows you to define common coordinates in a parent group that are then automatically available on every child node. The distinction between a locally-defined coordinate variables and an inherited coordinate that was defined on a parent node is reflected in the `DataTree.__repr__`. Generally if you prefer not to have these variables be inherited you can get more similar behaviour to the old `datatree` package by removing indexes from coordinates, as this prevents inheritance.
19
19
20
+
Tree structure checks between multiple trees (i.e., `DataTree.isomorophic`) and pairing of nodes in arithmetic has also changed. Nodes are now matched (with `xarray.group_subtrees`) based on their relative paths, without regard to the order in which child nodes are defined.
21
+
20
22
For further documentation see the page in the user guide on Hierarchical Data.
Copy file name to clipboardExpand all lines: doc/user-guide/hierarchical-data.rst
+63-18Lines changed: 63 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -362,21 +362,26 @@ This returns an iterable of nodes, which yields them in depth-first order.
362
362
for node in vertebrates.subtree:
363
363
print(node.path)
364
364
365
-
A very useful pattern is to use :py:class:`~xarray.DataTree.subtree` conjunction with the :py:class:`~xarray.DataTree.path` property to manipulate the nodes however you wish,
366
-
then rebuild a new tree using :py:meth:`xarray.DataTree.from_dict()`.
365
+
Similarly, :py:class:`~xarray.DataTree.subtree_with_keys` returns an iterable of
366
+
relative paths and corresponding nodes.
367
367
368
+
A very useful pattern is to iterate over :py:class:`~xarray.DataTree.subtree_with_keys`
369
+
to manipulate nodes however you wish, then rebuild a new tree using
370
+
:py:meth:`xarray.DataTree.from_dict()`.
368
371
For example, we could keep only the nodes containing data by looping over all nodes,
369
372
checking if they contain any data using :py:class:`~xarray.DataTree.has_data`,
370
373
then rebuilding a new tree using only the paths of those nodes:
371
374
372
375
.. ipython:: python
373
376
374
-
non_empty_nodes = {node.path: node.dataset for node in dt.subtree if node.has_data}
377
+
non_empty_nodes = {
378
+
path: node.dataset for path, node in dt.subtree_with_keys if node.has_data
379
+
}
375
380
xr.DataTree.from_dict(non_empty_nodes)
376
381
377
382
You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``.
378
383
379
-
(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`~xarray.DataTree.from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.)
384
+
(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`~xarray.DataTree.from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.name)``.)
380
385
381
386
.. _manipulating trees:
382
387
@@ -573,38 +578,78 @@ Then calculate the RMS value of these signals:
573
578
574
579
.. _multiple trees:
575
580
576
-
We can also use the :py:meth:`~xarray.map_over_datasets` decorator to promote a function which accepts datasets into one which
577
-
accepts datatrees.
581
+
We can also use :py:func:`~xarray.map_over_datasets` to apply a function over
582
+
the data in multiple trees, by passing the trees as positional arguments.
578
583
579
584
Operating on Multiple Trees
580
585
---------------------------
581
586
582
587
The examples so far have involved mapping functions or methods over the nodes of a single tree,
583
588
but we can generalize this to mapping functions over multiple trees at once.
584
589
590
+
Iterating Over Multiple Trees
591
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
592
+
593
+
To iterate over the corresponding nodes in multiple trees, use
594
+
:py:func:`~xarray.group_subtrees` instead of
595
+
:py:class:`~xarray.DataTree.subtree_with_keys`. This combines well with
596
+
:py:meth:`xarray.DataTree.from_dict()` to build a new tree:
0 commit comments