Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make illegal path-like variable names when constructing a DataTree from a Dataset #9378

3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ Deprecations
Bug fixes
~~~~~~~~~

- Make illegal path-like variable names when constructing a DataTree from a Dataset
(:issue:`9339`, :pull:`9378`)
By `Etienne Schalk <https://github.com/etienneschalk>`_.
- Fix bug with rechunking to a frequency when some periods contain no data (:issue:`9360`).
By `Deepak Cherian <https://github.com/dcherian>`_.
- Fix bug causing `DataTree.from_dict` to be sensitive to insertion order (:issue:`9276`, :pull:`9292`).
Expand Down
14 changes: 14 additions & 0 deletions xarray/core/datatree.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,19 @@ def _check_alignment(
_check_alignment(child_path, child_ds, base_ds, child.children)


def _check_for_slashes_in_names(variables: Iterable[Hashable]) -> None:
offending_variable_names = [
name for name in variables if isinstance(name, str) and "/" in name
]
if len(offending_variable_names) > 0:
raise KeyError(
etienneschalk marked this conversation as resolved.
Show resolved Hide resolved
"Given Dataset contains variable names with '/': "
f"{offending_variable_names}. "
"A Dataset represents a group, and a single group "
"cannot have path-like variable names with '/' characters in them. "
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good, but @keewis you mentioned you had a comment didn't you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the improvement to be made here is to clarify that currently slashes are allowed in variable names on xr.Dataset, but are forbidden in variable names in xr.DataTree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this has to be considered here, but NetCDF4 and HDF5 do not allow forward slashes in variable/dataset names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I think that doesn't change anything for this PR, so long as that restriction is already explicitly accounted for in the respective backend save methods.

etienneschalk marked this conversation as resolved.
Show resolved Hide resolved


class DatasetView(Dataset):
"""
An immutable Dataset-like view onto the data in a single DataTree node.
Expand Down Expand Up @@ -464,6 +477,7 @@ def __init__(
self.children = children

def _set_node_data(self, ds: Dataset):
_check_for_slashes_in_names(ds.variables)
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
data_vars, coord_vars = _collect_data_and_coord_variables(ds)
self._data_variables = data_vars
self._node_coord_variables = coord_vars
Expand Down
17 changes: 17 additions & 0 deletions xarray/tests/test_datatree.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,23 @@ def test_child_gets_named_on_attach(self):
mary: DataTree = DataTree(children={"Sue": sue}) # noqa
assert sue.name == "Sue"

def test_dataset_containing_slashes(self):
xda: xr.DataArray = xr.DataArray(
[[1, 2]],
coords={"label": ["a"], "R30m/y": [30, 60]},
)
xds: xr.Dataset = xr.Dataset({"group/subgroup/my_variable": xda})
with pytest.raises(
KeyError,
etienneschalk marked this conversation as resolved.
Show resolved Hide resolved
match=re.escape(
"Given Dataset contains variable names with '/': "
"['R30m/y', 'group/subgroup/my_variable']. "
"A Dataset represents a group, and a single group "
"cannot have path-like variable names with '/' characters in them. "
),
):
DataTree(xds)


class TestPaths:
def test_path_property(self):
Expand Down
Loading