Skip to content

Commit

Permalink
Add drop_duplicates for dims (pydata#5239)
Browse files Browse the repository at this point in the history
* Add drop_duplicates for dims

* Update PR # and fix lint

* Remove dataset

* Remove references to ds

* Update dataarray.py

* Update xarray/core/dataarray.py

Co-authored-by: keewis <keewis@users.noreply.github.com>

* Update dataarray.py

* Single dim

* Update xarray/core/dataarray.py

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>

* Update xarray/core/dataarray.py

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>

* Update xarray/core/dataarray.py

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>

* [skip-ci]

Co-authored-by: ahuang11 <ahuang11@illinois.edu>
Co-authored-by: keewis <keewis@users.noreply.github.com>
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Co-authored-by: dcherian <deepak@cherian.net>
  • Loading branch information
5 people authored May 15, 2021
1 parent d8f759c commit b1bd6c8
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,7 @@ DataArray contents
DataArray.swap_dims
DataArray.expand_dims
DataArray.drop_vars
DataArray.drop_duplicates
DataArray.reset_coords
DataArray.copy

Expand Down
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ v0.18.1 (unreleased)

New Features
~~~~~~~~~~~~

- Implement :py:meth:`DataArray.drop_duplicates`
to remove duplicate dimension values (:pull:`5239`).
By `Andrew Huang <https://github.com/ahuang11>`_.
- allow passing ``combine_attrs`` strategy names to the ``keep_attrs`` parameter of
:py:func:`apply_ufunc` (:pull:`5041`)
By `Justus Magin <https://github.com/keewis>`_.
Expand Down
27 changes: 27 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -4572,6 +4572,33 @@ def curvefit(
kwargs=kwargs,
)

def drop_duplicates(
self,
dim: Hashable,
keep: Union[
str,
bool,
] = "first",
):
"""Returns a new DataArray with duplicate dimension values removed.
Parameters
----------
dim : dimension label, optional
keep : {"first", "last", False}, default: "first"
Determines which duplicates (if any) to keep.
- ``"first"`` : Drop duplicates except for the first occurrence.
- ``"last"`` : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.
Returns
-------
DataArray
"""
if dim not in self.dims:
raise ValueError(f"'{dim}' not found in dimensions")
indexes = {dim: ~self.get_index(dim).duplicated(keep=keep)}
return self.isel(indexes)

# this needs to be at the end, or mypy will confuse with `str`
# https://mypy.readthedocs.io/en/latest/common_issues.html#dealing-with-conflicting-names
str = utils.UncachedAccessor(StringAccessor)
21 changes: 21 additions & 0 deletions xarray/tests/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -7434,3 +7434,24 @@ def test_clip(da):
# Unclear whether we want this work, OK to adjust the test when we have decided.
with pytest.raises(ValueError, match="arguments without labels along dimension"):
result = da.clip(min=da.mean("x"), max=da.mean("a").isel(x=[0, 1]))


@pytest.mark.parametrize("keep", ["first", "last", False])
def test_drop_duplicates(keep):
ds = xr.DataArray(
[0, 5, 6, 7], dims="time", coords={"time": [0, 0, 1, 2]}, name="test"
)

if keep == "first":
data = [0, 6, 7]
time = [0, 1, 2]
elif keep == "last":
data = [5, 6, 7]
time = [0, 1, 2]
else:
data = [6, 7]
time = [1, 2]

expected = xr.DataArray(data, dims="time", coords={"time": time}, name="test")
result = ds.drop_duplicates("time", keep=keep)
assert_equal(expected, result)

0 comments on commit b1bd6c8

Please sign in to comment.