Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

also apply combine_attrs to the attrs of the variables #4902

Merged
merged 31 commits into from
May 5, 2021
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
28b857e
enable the tests
keewis Feb 12, 2021
4ab0013
clear all attrs first
keewis Feb 12, 2021
c53c71a
implement the merging of variable attrs when merging
keewis Feb 12, 2021
df8354b
Merge branch 'master' into variable-combine_attrs
keewis Feb 12, 2021
a0a2265
implement the merging of variable attrs when concatenating
keewis Feb 12, 2021
9af2c3f
add tests for combine_attrs on variables with combine_*
keewis Feb 13, 2021
78f9ef9
modify tests in test_combine which have wrong expectation about attrs
keewis Feb 13, 2021
77d0208
move the attrs preserving into the try / except
keewis Feb 13, 2021
b234ddf
clear variable attrs where necessary
keewis Feb 13, 2021
0cbf1a4
rewrite the main object merge_attrs tests
keewis Feb 13, 2021
71d605a
clear the variable attrs first
keewis Feb 13, 2021
fb1fcb9
add combine_attrs="no_conflict"
keewis Feb 13, 2021
dad9c38
Merge branch 'master' into variable-combine_attrs
keewis Feb 27, 2021
3ee536d
add a entry to whats-new.rst
keewis Feb 27, 2021
f03c5e8
Merge branch 'master' into variable-combine_attrs
keewis Mar 28, 2021
78a3efd
Merge branch 'master' into variable-combine_attrs
keewis Apr 3, 2021
abacdc9
Update doc/whats-new.rst
keewis Apr 19, 2021
b4b64c1
dedent a hidden test
keewis Apr 5, 2021
cd3fd0e
fix whats-new.rst
keewis Apr 19, 2021
8ca5e44
conditionally exclude attrs
keewis Apr 20, 2021
34d6615
use add_attrs=False or construct manually instead of clearing attrs
keewis Apr 20, 2021
8ca3aeb
also merge attrs for indexed variables
keewis Apr 20, 2021
6e06dc4
use pytest.raises instead of raises_regex
keewis Apr 20, 2021
2c5c1be
Merge branch 'master' into variable-combine_attrs
keewis Apr 20, 2021
cc08b53
switch the default for merge's combine_attrs to override
keewis Apr 29, 2021
94c7896
Merge branch 'master' into variable-combine_attrs
keewis Apr 29, 2021
dffda43
use pytest.raises
keewis Apr 29, 2021
db0fc56
update whats-new.rst [skip-ci]
keewis Apr 29, 2021
a1f1dda
fix whats-new.rst [skip-ci]
keewis May 1, 2021
59f0732
Merge branch 'master' into variable-combine_attrs
keewis May 5, 2021
065e757
provide more context for the change of the default value [skip-ci]
keewis May 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ v0.17.1 (unreleased)

New Features
~~~~~~~~~~~~

- apply ``combine_attrs`` on data variables and coordinate variables when concatenating
and merging datasets and dataarrays (:pull:`4902`).
By `Justus Magin <https://github.com/keewis>`_.
- Add ``safe_chunks`` option to :py:meth:`Dataset.to_zarr` which allows overriding
checks made to ensure Dask and Zarr chunk compatibility (:issue:`5056`).
By `Ryan Abernathey <https://github.com/rabernat>`_
Expand Down Expand Up @@ -109,6 +111,9 @@ Breaking changes
``ds.coarsen(...).mean(keep_attrs=False)`` instead of ``ds.coarsen(..., keep_attrs=False).mean()``.
Further, coarsen now keeps attributes per default (:pull:`5227`).
By `Mathias Hauser <https://github.com/mathause>`_.
- switch the default of the :py:func`merge` ``combine_attrs`` parameter to ``"override"``
(:pull:`4902`)
keewis marked this conversation as resolved.
Show resolved Hide resolved
By `Justus Magin <https://github.com/keewis>`_.
keewis marked this conversation as resolved.
Show resolved Hide resolved

Deprecations
~~~~~~~~~~~~
Expand Down
4 changes: 2 additions & 2 deletions xarray/core/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,7 @@ def ensure_common_dims(vars):
vars = ensure_common_dims([ds[k].variable for ds in datasets])
except KeyError:
raise ValueError("%r is not present in all datasets." % k)
combined = concat_vars(vars, dim, positions)
combined = concat_vars(vars, dim, positions, combine_attrs=combine_attrs)
assert isinstance(combined, Variable)
result_vars[k] = combined
elif k in result_vars:
Expand Down Expand Up @@ -572,7 +572,7 @@ def _dataarray_concat(
positions,
fill_value=fill_value,
join=join,
combine_attrs="drop",
combine_attrs=combine_attrs,
)

merged_attrs = merge_attrs([da.attrs for da in arrays], combine_attrs)
Expand Down
16 changes: 13 additions & 3 deletions xarray/core/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@ def merge_collected(
grouped: Dict[Hashable, List[MergeElement]],
prioritized: Mapping[Hashable, MergeElement] = None,
compat: str = "minimal",
combine_attrs="override",
) -> Tuple[Dict[Hashable, Variable], Dict[Hashable, pd.Index]]:
"""Merge dicts of variables, while resolving conflicts appropriately.

Expand Down Expand Up @@ -222,11 +223,18 @@ def merge_collected(
% (name, variable.attrs, other_variable.attrs)
)
merged_vars[name] = variable
merged_vars[name].attrs = merge_attrs(
[var.attrs for var, _ in indexed_elements],
combine_attrs=combine_attrs,
)
merged_indexes[name] = index
else:
variables = [variable for variable, _ in elements_list]
try:
merged_vars[name] = unique_variable(name, variables, compat)
merged_vars[name].attrs = merge_attrs(
[var.attrs for var in variables], combine_attrs=combine_attrs
)
except MergeError:
if compat != "minimal":
# we need more than "minimal" compatibility (for which
Expand Down Expand Up @@ -613,7 +621,9 @@ def merge_core(
collected = collect_variables_and_indexes(aligned)

prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
variables, out_indexes = merge_collected(collected, prioritized, compat=compat)
variables, out_indexes = merge_collected(
collected, prioritized, compat=compat, combine_attrs=combine_attrs
)
assert_unique_multiindex_level_names(variables)

dims = calculate_dimensions(variables)
Expand Down Expand Up @@ -649,7 +659,7 @@ def merge(
compat: str = "no_conflicts",
join: str = "outer",
fill_value: object = dtypes.NA,
combine_attrs: str = "drop",
combine_attrs: str = "override",
) -> "Dataset":
"""Merge any number of xarray objects into a single Dataset as variables.

Expand Down Expand Up @@ -688,7 +698,7 @@ def merge(
variable names to fill values. Use a data array's name to
refer to its values.
combine_attrs : {"drop", "identical", "no_conflicts", "drop_conflicts", \
"override"}, default: "drop"
"override"}, default: "override"
String indicating how to combine attrs of the objects being merged:

- "drop": empty attrs on returned Dataset.
Expand Down
67 changes: 59 additions & 8 deletions xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -1771,7 +1771,14 @@ def reduce(
return Variable(dims, data, attrs=attrs)

@classmethod
def concat(cls, variables, dim="concat_dim", positions=None, shortcut=False):
def concat(
cls,
variables,
dim="concat_dim",
positions=None,
shortcut=False,
combine_attrs="override",
):
"""Concatenate variables along a new or existing dimension.

Parameters
Expand All @@ -1794,13 +1801,27 @@ def concat(cls, variables, dim="concat_dim", positions=None, shortcut=False):
This option is used internally to speed-up groupby operations.
If `shortcut` is True, some checks of internal consistency between
arrays to concatenate are skipped.
combine_attrs : {"drop", "identical", "no_conflicts", "drop_conflicts", \
"override"}, default: "override"
String indicating how to combine attrs of the objects being merged:

- "drop": empty attrs on returned Dataset.
- "identical": all attrs must be the same on every object.
- "no_conflicts": attrs from all objects are combined, any that have
the same name must also have the same value.
- "drop_conflicts": attrs from all objects are combined, any that have
the same name but different values are dropped.
- "override": skip comparing and copy attrs from the first dataset to
the result.

Returns
-------
stacked : Variable
Concatenated Variable formed by stacking all the supplied variables
along the given dimension.
"""
from .merge import merge_attrs

if not isinstance(dim, str):
(dim,) = dim.dims

Expand All @@ -1825,7 +1846,9 @@ def concat(cls, variables, dim="concat_dim", positions=None, shortcut=False):
dims = (dim,) + first_var.dims
data = duck_array_ops.stack(arrays, axis=axis)

attrs = dict(first_var.attrs)
attrs = merge_attrs(
[var.attrs for var in variables], combine_attrs=combine_attrs
)
encoding = dict(first_var.encoding)
if not shortcut:
for var in variables:
Expand Down Expand Up @@ -2581,12 +2604,21 @@ def __setitem__(self, key, value):
raise TypeError("%s values cannot be modified" % type(self).__name__)

@classmethod
def concat(cls, variables, dim="concat_dim", positions=None, shortcut=False):
def concat(
cls,
variables,
dim="concat_dim",
positions=None,
shortcut=False,
combine_attrs="override",
):
"""Specialized version of Variable.concat for IndexVariable objects.

This exists because we want to avoid converting Index objects to NumPy
arrays, if possible.
"""
from .merge import merge_attrs

if not isinstance(dim, str):
(dim,) = dim.dims

Expand All @@ -2613,12 +2645,13 @@ def concat(cls, variables, dim="concat_dim", positions=None, shortcut=False):
# keep as str if possible as pandas.Index uses object (converts to numpy array)
data = maybe_coerce_to_str(data, variables)

attrs = dict(first_var.attrs)
attrs = merge_attrs(
[var.attrs for var in variables], combine_attrs=combine_attrs
)
if not shortcut:
for var in variables:
if var.dims != first_var.dims:
raise ValueError("inconsistent dimensions")
utils.remove_incompatible_items(attrs, var.attrs)

return cls(first_var.dims, data, attrs)

Expand Down Expand Up @@ -2792,7 +2825,13 @@ def _broadcast_compat_data(self, other):
return self_data, other_data, dims


def concat(variables, dim="concat_dim", positions=None, shortcut=False):
def concat(
variables,
dim="concat_dim",
positions=None,
shortcut=False,
combine_attrs="override",
):
"""Concatenate variables along a new or existing dimension.

Parameters
Expand All @@ -2815,6 +2854,18 @@ def concat(variables, dim="concat_dim", positions=None, shortcut=False):
This option is used internally to speed-up groupby operations.
If `shortcut` is True, some checks of internal consistency between
arrays to concatenate are skipped.
combine_attrs : {"drop", "identical", "no_conflicts", "drop_conflicts", \
"override"}, default: "override"
String indicating how to combine attrs of the objects being merged:

- "drop": empty attrs on returned Dataset.
- "identical": all attrs must be the same on every object.
- "no_conflicts": attrs from all objects are combined, any that have
the same name must also have the same value.
- "drop_conflicts": attrs from all objects are combined, any that have
the same name but different values are dropped.
- "override": skip comparing and copy attrs from the first dataset to
the result.

Returns
-------
Expand All @@ -2824,9 +2875,9 @@ def concat(variables, dim="concat_dim", positions=None, shortcut=False):
"""
variables = list(variables)
if all(isinstance(v, IndexVariable) for v in variables):
return IndexVariable.concat(variables, dim, positions, shortcut)
return IndexVariable.concat(variables, dim, positions, shortcut, combine_attrs)
else:
return Variable.concat(variables, dim, positions, shortcut)
return Variable.concat(variables, dim, positions, shortcut, combine_attrs)


def assert_unique_multiindex_level_names(variables):
Expand Down
Loading