DataTree access to variables in parent groups

### Motivation

Accessing variables from parent groups in a tree would be useful. This has come up before in #1982 and https://github.com/xarray-contrib/datatree/issues/297. Here I'm going to summarize some discussion from recent [datatree meetings](https://github.com/pydata/xarray/issues/8747) .

A use case is to have common coordinate variables between multiple sub-groups, for example this multi-resolution datatree has a `time` coordinate that conceptually is common to two groups:

```python
DataTree('None', parent=None)
│   Dimensions:  (time: 4)
│   Coordinates:
│     * time     (time) int64 32B 0 1 2 3
│   Data variables:
│       *empty*
├── DataTree('low')
│       Dimensions:  (x: 3, time: 4)
│       Coordinates:
│         * x        (x) float64 24B 1.0 5.0 9.0
│       Dimensions without coordinates: time
│       Data variables:
│           a        (x, time) int64 96B 0 1 2 3 4 5 6 7 8 9 10 11
└── DataTree('high')
        Dimensions:  (x: 9, time: 4)
        Coordinates:
          * x        (x) float64 72B 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
        Dimensions without coordinates: time
        Data variables:
            a        (x, time) int64 288B 0 1 2 3 4 5 6 7 8 ... 28 29 30 31 32 33 34 35
```
It would be useful to be able to access the `time` coordinate variable from either child group, i.e. `dt['/high'].time`.

Indeed, the [CF conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#_search_by_proximity) explicitly describe this type of behaviour, in terms of searching for variables outside of the current group

> **Search by proximity**
>
> A variable or dimension specified with no path (for example, `lat`) refers to the variable or dimension of that name, if there is one, in the referring group. If not, the ancestors of the referring group are searched for it, starting from the direct ancestor and proceeding toward the root group, until it is found.

### Problem

We could imagine changing the interface of `DataTree` to allow users to access any compatible variables on parent groups, where compatible means alignable.

There are three issues with this:

1. Not all users will want to inherit all such variables, 
2. It would be a breaking change compared to the behaviour of the original datatree package,
3. Mapping operations (e.g. `.mean()`) over multiple nodes becomes really confusing, because copies of the same variable would effectively be present in multiple nodes.

### Proposal

Let me make a concrete feature proposal for discussion, which has some specific features:

1. Keep `.ds`, `.__getitem__` etc. on `DataTree` as-is. This means no breaking of backwards compatibility. This also means that we don't have to wait to implement all the details of this before releasing datatree in xarray `main`.

2. A clear definition of "compatible variables" for inheritance. These are alignable variables that exist on a parent (or grandparent etc.) Q: Should these be just coordinate variables? Or all variables?
        
3. Add additional API which allows access to inherited variables, via a new `.inherit` accessor on `DataTree` objects. (The name is not great, please feel free to suggest alternatives.)
	- Whilst `dt[...]` will never give access to inherited vars, `dt.inherit[...]` would allow `__getitem__` access to inherited vars
    - `dt.inherit.ds` would return a `DatasetView` of that node with extra inherited variables in it
	- `dt.inherit.to_dataset()` -> `xr.Dataset` containing inherited vars
	- Explicit API for propagating / shallow-copying all variables to child nodes?
	    - `dt.inherit()`? -> `DataTree`
	
1. Don't change `map_over_subtree` (again for backwards compatibility)
	- `map_over_inherited_subtree` isolates the conceptuals of mapping over tree with inherited variables
		- issues: e.g. map over and see the same variable multiple times (in its "local" group and in all its child groups)

This will be a new feature, to be done in a separate release (i.e. no blocker right now)

### Implementation

`dt.inherit` returns an `InheritedNode`, which at construction time creates and caches a mapping of all inherited variables (`._inherited_variables`). This then acts like a normal `DataTree` node except that it consults the inherited variables instead of the normal list of variables. 

Creating the list of inherited variables is done by walking up the tree from the current node, examining new variables as they are encountered.

Q: Does this design handle coordinate names?

EDIT: Actually there's an even simpler idea: `ds.inherit` -> `DataTree` which has a shallow copy of all compatible variables inherited onto that node. Then `.ds`, `.__getitem__` etc. will automatically behave as expected, as you will just have a new `DataTree` object with more valid keys.

### Describe alternatives you've considered

1. Not add any support for inheriting variables

That's what we currently have, and with this proposal we could eventually remove it if it turned out no-one liked it.

2. Integrate support into the existing API (i.e. change `dt.__getitem__` to access inherited variables)

It's not possible to do this without breaking changes. It's also not clear that there is a general one-size-fits-all answer to when variables should or shouldn't be inherited. This proposal provides both behaviours.

3. Allow users to change behaviour of objects

Some kind of switch (on the specific object instances, globally, or with a context manager) could be used to switch between the two behaviours. But this seems extremely error-prone, and means that user code becomes ambiguous without knowing the state of the switch.

cc @shoyer @keewis @flamingbear @owenlittlejohns @eni-awowale 

also @alexamici @benbovy I would love to hear your thoughts too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DataTree access to variables in parent groups #9056

Motivation

Problem

Proposal

Implementation

Describe alternatives you've considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DataTree access to variables in parent groups #9056

Description

Motivation

Problem

Proposal

Implementation

Describe alternatives you've considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions