Skip to content

Commit 68438a1

Browse files
committed
Merge pull request #507 from shoyer/sel_points
Add sel_points for point-wise indexing by label
2 parents 9a6354e + 3ffa8ed commit 68438a1

File tree

8 files changed

+143
-22
lines changed

8 files changed

+143
-22
lines changed

doc/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ Indexing
9494
Dataset.isel
9595
Dataset.sel
9696
Dataset.isel_points
97+
Dataset.sel_points
9798
Dataset.squeeze
9899
Dataset.reindex
99100
Dataset.reindex_like
@@ -206,6 +207,7 @@ Indexing
206207
DataArray.isel
207208
DataArray.sel
208209
DataArray.isel_points
210+
DataArray.sel_points
209211
DataArray.squeeze
210212
DataArray.reindex
211213
DataArray.reindex_like

doc/indexing.rst

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ DataArray:
5757

5858
Positional indexing deviates from the NumPy when indexing with multiple
5959
arrays like ``arr[[0, 1], [0, 1]]``, as described in :ref:`indexing details`.
60-
See :ref:`pointwise indexing` and :py:meth:`~xray.Dataset.isel_points` for more on this functionality.
60+
See :ref:`pointwise indexing` for how to achieve this functionality in xray.
6161

6262
xray also supports label-based indexing, just like pandas. Because
6363
we use a :py:class:`pandas.Index` under the hood, label based indexing is very
@@ -123,7 +123,8 @@ __ http://legacy.python.org/dev/peps/pep-0472/
123123

124124
.. warning::
125125

126-
Do not try to assign values when using ``isel``, ``isel_points`` or ``sel``::
126+
Do not try to assign values when using any of the indexing methods ``isel``,
127+
``isel_points``, ``sel`` or ``sel_points``::
127128

128129
# DO NOT do this
129130
arr.isel(space=0) = 0
@@ -143,7 +144,8 @@ Pointwise indexing
143144
xray pointwise indexing supports the indexing along multiple labeled dimensions
144145
using list-like objects. While :py:meth:`~xray.DataArray.isel` performs
145146
orthogonal indexing, the :py:meth:`~xray.DataArray.isel_points` method
146-
provides similar numpy indexing behavior as if you were using multiple lists to index an array (e.g. `arr[[0, 1], [0, 1]]` ):
147+
provides similar numpy indexing behavior as if you were using multiple
148+
lists to index an array (e.g. ``arr[[0, 1], [0, 1]]`` ):
147149

148150
.. ipython:: python
149151
@@ -152,6 +154,17 @@ provides similar numpy indexing behavior as if you were using multiple lists to
152154
da
153155
da.isel_points(x=[0, 1, 6], y=[0, 1, 0])
154156
157+
There is also :py:meth:`~xray.DataArray.sel_points`, which analogously
158+
allows you to do point-wise indexing by label:
159+
160+
.. ipython:: python
161+
162+
times = pd.to_datetime(['2000-01-03', '2000-01-02', '2000-01-01'])
163+
arr.sel_points(space=['IA', 'IL', 'IN'], time=times)
164+
165+
The equivalent pandas method to ``sel_points`` is
166+
:py:meth:`~pandas.DataFrame.lookup`.
167+
155168
Dataset indexing
156169
----------------
157170

doc/whats-new.rst

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,16 @@ v0.5.3 (unreleased)
1717

1818
- Dataset variables are now written to netCDF files in order of appearance
1919
when using the netcdf4 backend (:issue:`479`).
20-
- Added :py:meth:`~xray.Dataset.isel_points` and :py:meth:`~xray.DataArray.isel_points` to support pointwise indexing of Datasets and DataArrays (:issue:`475`).
20+
- Added :py:meth:`~xray.Dataset.isel_points` and :py:meth:`~xray.Dataset.sel_points`
21+
to support pointwise indexing of Datasets and DataArrays (:issue:`475`).
2122

2223
.. ipython::
2324
:verbatim:
2425

2526
In [1]: da = xray.DataArray(np.arange(56).reshape((7, 8)),
26-
dims=['x', 'y'])
27+
...: coords={'x': list('abcdefg'),
28+
...: 'y': 10 * np.arange(8)},
29+
...: dims=['x', 'y'])
2730

2831
In [2]: da
2932
Out[2]:
@@ -36,16 +39,27 @@ v0.5.3 (unreleased)
3639
[40, 41, 42, 43, 44, 45, 46, 47],
3740
[48, 49, 50, 51, 52, 53, 54, 55]])
3841
Coordinates:
39-
* x (x) int64 0 1 2 3 4 5 6
40-
* y (y) int64 0 1 2 3 4 5 6 7
42+
* y (y) int64 0 10 20 30 40 50 60 70
43+
* x (x) |S1 'a' 'b' 'c' 'd' 'e' 'f' 'g'
4144
45+
# we can index by position along each dimension
4246
In [3]: da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points')
4347
Out[3]:
4448
<xray.DataArray (points: 3)>
4549
array([ 0, 9, 48])
4650
Coordinates:
47-
x (points) int64 0 1 6
48-
y (points) int64 0 1 0
51+
y (points) int64 0 10 0
52+
x (points) |S1 'a' 'b' 'g'
53+
* points (points) int64 0 1 2
54+
55+
# or equivalently by label
56+
In [9]: da.sel_points(x=['a', 'b', 'g'], y=[0, 10, 0], dim='points')
57+
Out[9]:
58+
<xray.DataArray (points: 3)>
59+
array([ 0, 9, 48])
60+
Coordinates:
61+
y (points) int64 0 10 0
62+
x (points) |S1 'a' 'b' 'g'
4963
* points (points) int64 0 1 2
5064
5165
- New :py:meth:`~xray.Dataset.where` method for masking xray objects according
@@ -59,7 +73,6 @@ v0.5.3 (unreleased)
5973
@savefig where_example.png width=4in height=4in
6074
ds.distance.where(ds.distance < 100).plot()
6175
62-
6376
Bug fixes
6477
~~~~~~~~~
6578

xray/core/alignment.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ def reindex_variables(variables, indexes, indexers, method=None, copy=True):
119119
method : {None, 'nearest', 'pad'/'ffill', 'backfill'/'bfill'}, optional
120120
Method to use for filling index values in ``indexers`` not found in
121121
this dataset:
122-
* default: don't fill gaps
122+
* None (default): don't fill gaps
123123
* pad / ffill: propgate last valid index value forward
124124
* backfill / bfill: propagate next valid index value backward
125125
* nearest: use nearest valid index value

xray/core/dataarray.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -562,6 +562,17 @@ def isel_points(self, dim='points', **indexers):
562562
ds = self._dataset.isel_points(dim=dim, **indexers)
563563
return self._with_replaced_dataset(ds)
564564

565+
def sel_points(self, dim='points', method=None, **indexers):
566+
"""Return a new DataArray whose dataset is given by pointwise selection
567+
of index labels along the specified dimension(s).
568+
569+
See Also
570+
--------
571+
Dataset.sel_points
572+
"""
573+
ds = self._dataset.sel_points(dim=dim, method=method, **indexers)
574+
return self._with_replaced_dataset(ds)
575+
565576
def reindex_like(self, other, method=None, copy=True):
566577
"""Conform this object onto the indexes of another object, filling
567578
in missing values with NaN.
@@ -579,7 +590,7 @@ def reindex_like(self, other, method=None, copy=True):
579590
Method to use for filling index values from other not found on this
580591
data array:
581592
582-
* default: don't fill gaps
593+
* None (default): don't fill gaps
583594
* pad / ffill: propgate last valid index value forward
584595
* backfill / bfill: propagate next valid index value backward
585596
* nearest: use nearest valid index value (requires pandas>=0.16)
@@ -615,7 +626,7 @@ def reindex(self, method=None, copy=True, **indexers):
615626
Method to use for filling index values in ``indexers`` not found on
616627
this data array:
617628
618-
* default: don't fill gaps
629+
* None (default): don't fill gaps
619630
* pad / ffill: propgate last valid index value forward
620631
* backfill / bfill: propagate next valid index value backward
621632
* nearest: use nearest valid index value (requires pandas>=0.16)

xray/core/dataset.py

Lines changed: 60 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -962,8 +962,9 @@ def isel(self, **indexers):
962962
See Also
963963
--------
964964
Dataset.sel
965+
Dataset.sel_points
966+
Dataset.isel_points
965967
DataArray.isel
966-
DataArray.sel
967968
"""
968969
invalid = [k for k in indexers if not k in self.dims]
969970
if invalid:
@@ -988,7 +989,7 @@ def sel(self, method=None, **indexers):
988989
In contrast to `Dataset.isel`, indexers for this method should use
989990
labels instead of integers.
990991
991-
Under the hood, this method is powered by using Panda's powerful Index
992+
Under the hood, this method is powered by using pandas's powerful Index
992993
objects. This makes label based indexing essentially just as fast as
993994
using integer indexing.
994995
@@ -1003,7 +1004,7 @@ def sel(self, method=None, **indexers):
10031004
method : {None, 'nearest', 'pad'/'ffill', 'backfill'/'bfill'}, optional
10041005
Method to use for inexact matches (requires pandas>=0.16):
10051006
1006-
* default: only exact matches
1007+
* None (default): only exact matches
10071008
* pad / ffill: propgate last valid index value forward
10081009
* backfill / bfill: propagate next valid index value backward
10091010
* nearest: use nearest valid index value
@@ -1023,7 +1024,8 @@ def sel(self, method=None, **indexers):
10231024
See Also
10241025
--------
10251026
Dataset.isel
1026-
DataArray.isel
1027+
Dataset.sel_points
1028+
Dataset.isel_points
10271029
DataArray.sel
10281030
"""
10291031
return self.isel(**indexing.remap_label_indexers(self, indexers,
@@ -1062,8 +1064,8 @@ def isel_points(self, dim='points', **indexers):
10621064
See Also
10631065
--------
10641066
Dataset.sel
1065-
DataArray.isel
1066-
DataArray.sel
1067+
Dataset.isel
1068+
Dataset.sel_points
10671069
DataArray.isel_points
10681070
"""
10691071
indexer_dims = set(indexers)
@@ -1116,6 +1118,56 @@ def relevant_keys(mapping):
11161118
zip(*[v for k, v in indexers])]],
11171119
dim=dim, coords=coords, data_vars=data_vars)
11181120

1121+
def sel_points(self, dim='points', method=None, **indexers):
1122+
"""Returns a new dataset with each array indexed pointwise by tick
1123+
labels along the specified dimension(s).
1124+
1125+
In contrast to `Dataset.isel_points`, indexers for this method should
1126+
use labels instead of integers.
1127+
1128+
In contrast to `Dataset.sel`, this method selects points along the
1129+
diagonal of multi-dimensional arrays, not the intersection.
1130+
1131+
Parameters
1132+
----------
1133+
dim : str or DataArray or pandas.Index or other list-like object, optional
1134+
Name of the dimension to concatenate along. If dim is provided as a
1135+
string, it must be a new dimension name, in which case it is added
1136+
along axis=0. If dim is provided as a DataArray or Index or
1137+
list-like object, its name, which must not be present in the
1138+
dataset, is used as the dimension to concatenate along and the
1139+
values are added as a coordinate.
1140+
method : {None, 'nearest', 'pad'/'ffill', 'backfill'/'bfill'}, optional
1141+
Method to use for inexact matches (requires pandas>=0.16):
1142+
1143+
* None (default): only exact matches
1144+
* pad / ffill: propgate last valid index value forward
1145+
* backfill / bfill: propagate next valid index value backward
1146+
* nearest: use nearest valid index value
1147+
**indexers : {dim: indexer, ...}
1148+
Keyword arguments with names matching dimensions and values given
1149+
by array-like objects. All indexers must be the same length and
1150+
1 dimensional.
1151+
1152+
Returns
1153+
-------
1154+
obj : Dataset
1155+
A new Dataset with the same contents as this dataset, except each
1156+
array and dimension is indexed by the appropriate indexers. With
1157+
pointwise indexing, the new Dataset will always be a copy of the
1158+
original.
1159+
1160+
See Also
1161+
--------
1162+
Dataset.sel
1163+
Dataset.isel
1164+
Dataset.isel_points
1165+
DataArray.sel_points
1166+
"""
1167+
pos_indexers = indexing.remap_label_indexers(self, indexers,
1168+
method=method)
1169+
return self.isel_points(dim=dim, **pos_indexers)
1170+
11191171
def reindex_like(self, other, method=None, copy=True):
11201172
"""Conform this object onto the indexes of another object, filling
11211173
in missing values with NaN.
@@ -1133,7 +1185,7 @@ def reindex_like(self, other, method=None, copy=True):
11331185
Method to use for filling index values from other not found in this
11341186
dataset:
11351187
1136-
* default: don't fill gaps
1188+
* None (default): don't fill gaps
11371189
* pad / ffill: propgate last valid index value forward
11381190
* backfill / bfill: propagate next valid index value backward
11391191
* nearest: use nearest valid index value (requires pandas>=0.16)
@@ -1170,7 +1222,7 @@ def reindex(self, indexers=None, method=None, copy=True, **kw_indexers):
11701222
Method to use for filling index values in ``indexers`` not found in
11711223
this dataset:
11721224
1173-
* default: don't fill gaps
1225+
* None (default): don't fill gaps
11741226
* pad / ffill: propgate last valid index value forward
11751227
* backfill / bfill: propagate next valid index value backward
11761228
* nearest: use nearest valid index value (requires pandas>=0.16)

xray/test/test_dataarray.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,7 @@ def test_sel_method(self):
382382
actual = data.sel(x=[0.9, 1.9], method='backfill')
383383
self.assertDataArrayIdentical(expected, actual)
384384

385-
def test_isel_points_method(self):
385+
def test_isel_points(self):
386386
shape = (10, 5, 6)
387387
np_array = np.random.random(shape)
388388
da = DataArray(np_array, dims=['time', 'y', 'x'])
@@ -442,6 +442,16 @@ def test_isel_points_method(self):
442442
actual = da.isel_points(y=[1, 2], x=[1, 2], dim=['A', 'B'])
443443
assert 'points' in actual.coords
444444

445+
def test_isel_points(self):
446+
shape = (10, 5, 6)
447+
np_array = np.random.random(shape)
448+
da = DataArray(np_array, dims=['time', 'y', 'x'])
449+
y = [1, 3]
450+
x = [3, 0]
451+
expected = da.isel_points(x=x, y=y)
452+
actual = da.sel_points(x=x, y=y)
453+
self.assertDataArrayIdentical(expected, actual)
454+
445455
def test_loc(self):
446456
self.ds['x'] = ('x', np.array(list('abcdefghij')))
447457
da = self.ds['foo']

xray/test/test_dataset.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -729,6 +729,26 @@ def test_isel_points(self):
729729
dim2=stations['dim2s'],
730730
dim=np.array([4, 5, 6]))
731731

732+
def test_sel_points(self):
733+
data = create_test_data()
734+
735+
pdim1 = [1, 2, 3]
736+
pdim2 = [4, 5, 1]
737+
pdim3 = [1, 2, 3]
738+
expected = data.isel_points(dim1=pdim1, dim2=pdim2, dim3=pdim3,
739+
dim='test_coord')
740+
actual = data.sel_points(dim1=data.dim1[pdim1], dim2=data.dim2[pdim2],
741+
dim3=data.dim3[pdim3], dim='test_coord')
742+
self.assertDatasetIdentical(expected, actual)
743+
744+
data = Dataset({'foo': (('x', 'y'), np.arange(9).reshape(3, 3))})
745+
expected = Dataset({'foo': ('points', [0, 4, 8])},
746+
{'x': ('points', range(3)),
747+
'y': ('points', range(3))})
748+
actual = data.sel_points(x=[0.1, 1.1, 2.5], y=[0, 1.2, 2.0],
749+
method='pad')
750+
self.assertDatasetIdentical(expected, actual)
751+
732752
def test_sel_method(self):
733753
data = create_test_data()
734754

0 commit comments

Comments
 (0)