Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a drop_conflicts strategy for merging attrs #4827

Merged
merged 25 commits into from
Feb 10, 2021

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Jan 19, 2021

As described in #4749, this adds a "drop_conflicts" strategy for merging attrs. For now, only merge works, combine_by_attrs, combine_nested, and concat still fail because they rely on variable.concat to always use "override".

Overriding CI behaviors

By default, the upstream dev CI is disabled on pull request and push events. You can override this behavior per commit by adding a [test-upstream] tag to the first line of the commit message. For documentation-only commits, you can skip the CI per commit by adding a [skip-ci] tag to the first line of the commit message

@keewis
Copy link
Collaborator Author

keewis commented Jan 23, 2021

this seems a bit tricky to get right: combine_attrs is only applied to the main object, not to its variables:

variables, out_indexes = merge_collected(collected, prioritized, compat=compat)

and similar for concat (and probably combine_*). For now, I'll mark the tests as "skipped".

@keewis
Copy link
Collaborator Author

keewis commented Jan 23, 2021

I only modified merge, concat and combine_* (binary arithmetic and apply_ufunc only support a boolean (keep_attrs), which means the choice is between "drop" and "override"). Dataset.merge currently does not have a combine_attrs kwarg (I think it uses the default of merge_core, which is "override"), and I will leave where and Dataset.where to #4687.

As mentioned above, merge_attrs is currently only applied to the objects themselves, the attrs of variables and coordinates are merged using "override" (I think?).

@keewis
Copy link
Collaborator Author

keewis commented Jan 23, 2021

Unless I'm missing something this should be ready for review

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't have that much context — the code looks excellent, and the tests very complete!

Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

),
],
)
def test_concat_combine_attrs_kwarg_variables(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_concat_combine_attrs_kwarg_variables(
def test_concat_combine_attrs_kwarg_dataarrays(

or data_vars (took me a while to figure out what was different from above)

Copy link
Collaborator Author

@keewis keewis Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I forgot to check the coords. The purpose of these tests is to check whether combine_attrs is applied to the variables (coords and data variables) in addition to the top-level Dataset / DataArray. Would it help to have a comment explaining the difference?

),
],
)
def test_merge_arrays_attrs_variables(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito

@dcherian
Copy link
Contributor

Thanks @keewis

@dcherian dcherian merged commit 47889ee into pydata:master Feb 10, 2021
@keewis keewis deleted the keep_attrs-drop_conflicts branch February 10, 2021 21:34
dcherian added a commit to DWesl/xarray that referenced this pull request Feb 11, 2021
…_and_bounds_as_coords

* upstream/master: (51 commits)
  Ensure maximum accuracy when encoding and decoding cftime.datetime values (pydata#4758)
  Fix `bounds_error=True` ignored with 1D interpolation (pydata#4855)
  add a drop_conflicts strategy for merging attrs (pydata#4827)
  update pre-commit hooks (mypy) (pydata#4883)
  ensure warnings cannot become errors in assert_ (pydata#4864)
  update pre-commit hooks (pydata#4874)
  small fixes for the docstrings of swap_dims and integrate (pydata#4867)
  Modify _encode_datetime_with_cftime for compatibility with cftime > 1.4.0 (pydata#4871)
  vélin (pydata#4872)
  don't skip the doctests CI (pydata#4869)
  fix da.pad example for numpy 1.20 (pydata#4865)
  temporarily pin dask (pydata#4873)
  Add units if "unit" is in the attrs. (pydata#4850)
  speed up the repr for big MultiIndex objects (pydata#4846)
  dim -> coord in DataArray.integrate (pydata#3993)
  WIP: backend interface, now it uses subclassing  (pydata#4836)
  weighted: small improvements (pydata#4818)
  Update related-projects.rst (pydata#4844)
  iris update doc url (pydata#4845)
  Faster unstacking (pydata#4746)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Feb 12, 2021
* upstream/master: (24 commits)
  Compatibility with dask 2021.02.0 (pydata#4884)
  Ensure maximum accuracy when encoding and decoding cftime.datetime values (pydata#4758)
  Fix `bounds_error=True` ignored with 1D interpolation (pydata#4855)
  add a drop_conflicts strategy for merging attrs (pydata#4827)
  update pre-commit hooks (mypy) (pydata#4883)
  ensure warnings cannot become errors in assert_ (pydata#4864)
  update pre-commit hooks (pydata#4874)
  small fixes for the docstrings of swap_dims and integrate (pydata#4867)
  Modify _encode_datetime_with_cftime for compatibility with cftime > 1.4.0 (pydata#4871)
  vélin (pydata#4872)
  don't skip the doctests CI (pydata#4869)
  fix da.pad example for numpy 1.20 (pydata#4865)
  temporarily pin dask (pydata#4873)
  Add units if "unit" is in the attrs. (pydata#4850)
  speed up the repr for big MultiIndex objects (pydata#4846)
  dim -> coord in DataArray.integrate (pydata#3993)
  WIP: backend interface, now it uses subclassing  (pydata#4836)
  weighted: small improvements (pydata#4818)
  Update related-projects.rst (pydata#4844)
  iris update doc url (pydata#4845)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option for combine_attrs with conflicting values silently dropped
4 participants