Skip to content

Commit b1bd6c8

Browse files
ahuang11ahuang11keewismax-sixtydcherian
authored
Add drop_duplicates for dims (#5239)
* Add drop_duplicates for dims * Update PR # and fix lint * Remove dataset * Remove references to ds * Update dataarray.py * Update xarray/core/dataarray.py Co-authored-by: keewis <keewis@users.noreply.github.com> * Update dataarray.py * Single dim * Update xarray/core/dataarray.py Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> * Update xarray/core/dataarray.py Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> * Update xarray/core/dataarray.py Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> * [skip-ci] Co-authored-by: ahuang11 <ahuang11@illinois.edu> Co-authored-by: keewis <keewis@users.noreply.github.com> Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> Co-authored-by: dcherian <deepak@cherian.net>
1 parent d8f759c commit b1bd6c8

File tree

4 files changed

+53
-0
lines changed

4 files changed

+53
-0
lines changed

doc/api.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,7 @@ DataArray contents
292292
DataArray.swap_dims
293293
DataArray.expand_dims
294294
DataArray.drop_vars
295+
DataArray.drop_duplicates
295296
DataArray.reset_coords
296297
DataArray.copy
297298

doc/whats-new.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ v0.18.1 (unreleased)
2121

2222
New Features
2323
~~~~~~~~~~~~
24+
25+
- Implement :py:meth:`DataArray.drop_duplicates`
26+
to remove duplicate dimension values (:pull:`5239`).
27+
By `Andrew Huang <https://github.com/ahuang11>`_.
2428
- allow passing ``combine_attrs`` strategy names to the ``keep_attrs`` parameter of
2529
:py:func:`apply_ufunc` (:pull:`5041`)
2630
By `Justus Magin <https://github.com/keewis>`_.

xarray/core/dataarray.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4572,6 +4572,33 @@ def curvefit(
45724572
kwargs=kwargs,
45734573
)
45744574

4575+
def drop_duplicates(
4576+
self,
4577+
dim: Hashable,
4578+
keep: Union[
4579+
str,
4580+
bool,
4581+
] = "first",
4582+
):
4583+
"""Returns a new DataArray with duplicate dimension values removed.
4584+
Parameters
4585+
----------
4586+
dim : dimension label, optional
4587+
keep : {"first", "last", False}, default: "first"
4588+
Determines which duplicates (if any) to keep.
4589+
- ``"first"`` : Drop duplicates except for the first occurrence.
4590+
- ``"last"`` : Drop duplicates except for the last occurrence.
4591+
- False : Drop all duplicates.
4592+
4593+
Returns
4594+
-------
4595+
DataArray
4596+
"""
4597+
if dim not in self.dims:
4598+
raise ValueError(f"'{dim}' not found in dimensions")
4599+
indexes = {dim: ~self.get_index(dim).duplicated(keep=keep)}
4600+
return self.isel(indexes)
4601+
45754602
# this needs to be at the end, or mypy will confuse with `str`
45764603
# https://mypy.readthedocs.io/en/latest/common_issues.html#dealing-with-conflicting-names
45774604
str = utils.UncachedAccessor(StringAccessor)

xarray/tests/test_dataarray.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7434,3 +7434,24 @@ def test_clip(da):
74347434
# Unclear whether we want this work, OK to adjust the test when we have decided.
74357435
with pytest.raises(ValueError, match="arguments without labels along dimension"):
74367436
result = da.clip(min=da.mean("x"), max=da.mean("a").isel(x=[0, 1]))
7437+
7438+
7439+
@pytest.mark.parametrize("keep", ["first", "last", False])
7440+
def test_drop_duplicates(keep):
7441+
ds = xr.DataArray(
7442+
[0, 5, 6, 7], dims="time", coords={"time": [0, 0, 1, 2]}, name="test"
7443+
)
7444+
7445+
if keep == "first":
7446+
data = [0, 6, 7]
7447+
time = [0, 1, 2]
7448+
elif keep == "last":
7449+
data = [5, 6, 7]
7450+
time = [0, 1, 2]
7451+
else:
7452+
data = [6, 7]
7453+
time = [1, 2]
7454+
7455+
expected = xr.DataArray(data, dims="time", coords={"time": time}, name="test")
7456+
result = ds.drop_duplicates("time", keep=keep)
7457+
assert_equal(expected, result)

0 commit comments

Comments
 (0)