Skip to content

Commit bb51a8e

Browse files
committed
Clarify rules for copies vs views when indexing
1 parent 4348958 commit bb51a8e

File tree

4 files changed

+131
-122
lines changed

4 files changed

+131
-122
lines changed

doc/indexing.rst

Lines changed: 112 additions & 118 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,12 @@ DataArray:
5353
arr[0, 0]
5454
arr[:, [2, 1]]
5555
56+
Attributes are persisted in all indexing operations.
57+
5658
.. warning::
5759

5860
Positional indexing deviates from the NumPy when indexing with multiple
59-
arrays like ``arr[[0, 1], [0, 1]]``, as described in :ref:`indexing details`.
61+
arrays like ``arr[[0, 1], [0, 1]]``, as described in :ref:`orthogonal`.
6062
See :ref:`pointwise indexing` and :py:meth:`~xray.Dataset.isel_points` for more on this functionality.
6163

6264
xray also supports label-based indexing, just like pandas. Because
@@ -81,6 +83,7 @@ Setting values with label based indexing is also supported:
8183
arr.loc['2000-01-01', ['IL', 'IN']] = -10
8284
arr
8385
86+
8487
Indexing with labeled dimensions
8588
--------------------------------
8689

@@ -191,39 +194,126 @@ index labels along a dimension dropped:
191194
192195
``drop`` is both a ``Dataset`` and ``DataArray`` method.
193196

194-
.. _indexing details:
197+
.. _nearest neighbor lookups:
195198

196-
Indexing details
197-
----------------
199+
Nearest neighbor lookups
200+
------------------------
198201

199-
Like pandas, whether array indexing returns a view or a copy of the underlying
200-
data depends entirely on numpy:
202+
The label based selection methods :py:meth:`~xray.Dataset.sel`,
203+
:py:meth:`~xray.Dataset.reindex` and :py:meth:`~xray.Dataset.reindex_like` all
204+
support a ``method`` keyword argument. The method parameter allows for
205+
enabling nearest neighbor (inexact) lookups by use of the methods ``'pad'``,
206+
``'backfill'`` or ``'nearest'``:
201207

202-
* Indexing with a single label or a slice returns a view.
203-
* Indexing with a vector of array labels returns a copy.
208+
.. ipython:: python
209+
210+
data = xray.DataArray([1, 2, 3], dims='x')
211+
data.sel(x=[1.1, 1.9], method='nearest')
212+
data.sel(x=0.1, method='backfill')
213+
data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
204214
205-
Attributes are persisted in array indexing:
215+
Using ``method='nearest'`` or a scalar argument with ``.sel()`` requires pandas
216+
version 0.16 or newer.
217+
218+
The method parameter is not yet supported if any of the arguments
219+
to ``.sel()`` is a ``slice`` object:
220+
221+
.. ipython::
222+
:verbatim:
223+
224+
In [1]: data.sel(x=slice(1, 3), method='nearest')
225+
NotImplementedError
226+
227+
However, you don't need to use ``method`` to do inexact slicing. Slicing
228+
already returns all values inside the range (inclusive), as long as the index
229+
labels are monotonic increasing:
230+
231+
.. ipython:: python
232+
233+
data.sel(x=slice(0.9, 3.1))
234+
235+
Indexing axes with monotonic decreasing labels also works, as long as the
236+
``slice`` or ``.loc`` arguments are also decreasing:
206237

207238
.. ipython:: python
208239
209-
arr2 = arr.copy()
210-
arr2.attrs['units'] = 'meters'
211-
arr2[0, 0].attrs
240+
reversed_data = data[::-1]
241+
reversed_data.loc[3.1:0.9]
242+
243+
.. _masking with where:
244+
245+
Masking with ``where``
246+
----------------------
247+
248+
Indexing methods on xray objects generally return a subset of the original data.
249+
However, it is sometimes useful to select an object with the same shape as the
250+
original data, but with some elements masked. To do this type of selection in
251+
xray, use :py:meth:`~xray.DataArray.where`:
252+
253+
.. ipython:: python
254+
255+
arr = xray.DataArray(np.arange(16).reshape(4, 4), dims=['x', 'y'])
256+
arr.where(arr.x + arr.y < 4)
257+
258+
This is particularly useful for ragged indexing of multi-dimensional data,
259+
e.g., to apply a 2D mask to an image. Note that ``where`` follows all the
260+
usual xray broadcasting and alignment rules for binary operations (e.g.,
261+
``+``) between the object being indexed and the condition, as described in
262+
:ref:`comput`:
263+
264+
.. ipython:: python
265+
266+
arr.where(arr.y < 2)
267+
268+
Multi-dimensional indexing
269+
--------------------------
270+
271+
Xray does not yet support efficient routines for generalized multi-dimensional
272+
indexing or regridding. However, we are definitely interested in adding support
273+
for this in the future (see :issue:`475` for the ongoing discussion).
274+
275+
.. _copies vs views:
276+
277+
Copies vs. views
278+
----------------
279+
280+
Whether array indexing returns a view or a copy of the underlying
281+
data depends on the nature of the labels. For positional (integer)
282+
indexing, xray follows the same rules as NumPy:
283+
284+
* Positional indexing with only integers and slices returns a view.
285+
* Positional indexing with arrays or lists returns a copy.
286+
287+
The rules for label based indexing are more complex:
288+
289+
* Label-based indexing with only slices returns a view.
290+
* Label-based indexing with arrays returns a copy.
291+
* Label-based indexing with scalars returns a view or a copy, depending
292+
upon if the corresponding positional indexer can be represented as an
293+
integer or a slice object.
294+
295+
.. _orthogonal:
296+
297+
Orthogonal (outer) vs. vectorized indexing
298+
------------------------------------------
212299

213300
Indexing with xray objects has one important difference from indexing numpy
214301
arrays: you can only use one-dimensional arrays to index xray objects, and
215302
each indexer is applied "orthogonally" along independent axes, instead of
216-
using numpy's advanced broadcasting. This means you can do indexing like this,
217-
which would require slightly more awkward syntax with numpy arrays:
303+
using numpy's broadcasting rules to vectorize indexers. This means you can do
304+
indexing like this, which would require slightly more awkward syntax with
305+
numpy arrays:
218306

219307
.. ipython:: python
220308
221309
arr[arr['time.day'] > 1, arr['space'] != 'IL']
222310
223-
This is a much simpler model than numpy's `advanced indexing`__,
224-
and is basically the only model that works for labeled arrays. If you would
225-
like to do array indexing, you can always index ``.values`` directly
226-
instead:
311+
This is a much simpler model than numpy's `advanced indexing`__. If you would
312+
like to do advanced-style array indexing in xray, you have several options:
313+
314+
* :ref:`pointwise indexing`
315+
* :ref:`masking with where`
316+
* Index the underlying NumPy directly array using ``.values``:
227317

228318
__ http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
229319

@@ -242,6 +332,10 @@ original values are subset to the index labels still found in the new labels,
242332
and values corresponding to new labels not found in the original object are
243333
in-filled with `NaN`.
244334

335+
Xray operations that combine multiple xray objects generally automatically
336+
align their arguments. However, manual alignment can be useful for greater
337+
control.
338+
245339
To reindex a particular dimension, use :py:meth:`~xray.DataArray.reindex`:
246340

247341
.. ipython:: python
@@ -289,103 +383,3 @@ Both ``reindex_like`` and ``align`` work interchangeably between
289383
other = xray.DataArray(['a', 'b', 'c'], dims='other')
290384
# this is a no-op, because there are no shared dimension names
291385
ds.reindex_like(other)
292-
293-
.. _nearest neighbor lookups:
294-
295-
Nearest neighbor lookups
296-
------------------------
297-
298-
The label based selection methods :py:meth:`~xray.Dataset.sel`,
299-
:py:meth:`~xray.Dataset.reindex` and :py:meth:`~xray.Dataset.reindex_like` all
300-
support a ``method`` keyword argument. The method parameter allows for
301-
enabling nearest neighbor (inexact) lookups by use of the methods ``'pad'``,
302-
``'backfill'`` or ``'nearest'``:
303-
304-
.. use verbatim because I can't seem to install pandas 0.16.1 on RTD :(
305-
306-
.. .. ipython::
307-
:verbatim:
308-
In [35]: data = xray.DataArray([1, 2, 3], dims='x')
309-
In [36]: data.sel(x=[1.1, 1.9], method='nearest')
310-
Out[36]:
311-
<xray.DataArray (x: 2)>
312-
array([2, 3])
313-
Coordinates:
314-
* x (x) int64 1 2
315-
In [37]: data.sel(x=0.1, method='backfill')
316-
Out[37]:
317-
<xray.DataArray ()>
318-
array(2)
319-
Coordinates:
320-
x int64 1
321-
In [38]: data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
322-
Out[38]:
323-
<xray.DataArray (x: 5)>
324-
array([1, 2, 2, 3, 3])
325-
Coordinates:
326-
* x (x) float64 0.5 1.0 1.5 2.0 2.5
327-
328-
.. ipython:: python
329-
330-
data = xray.DataArray([1, 2, 3], dims='x')
331-
data.sel(x=[1.1, 1.9], method='nearest')
332-
data.sel(x=0.1, method='backfill')
333-
data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
334-
335-
Using ``method='nearest'`` or a scalar argument with ``.sel()`` requires pandas
336-
version 0.16 or newer.
337-
338-
The method parameter is not yet supported if any of the arguments
339-
to ``.sel()`` is a ``slice`` object:
340-
341-
.. ipython::
342-
:verbatim:
343-
344-
In [1]: data.sel(x=slice(1, 3), method='nearest')
345-
NotImplementedError
346-
347-
However, you don't need to use ``method`` to do inexact slicing. Slicing
348-
already returns all values inside the range (inclusive), as long as the index
349-
labels are monotonic increasing:
350-
351-
.. ipython:: python
352-
353-
data.sel(x=slice(0.9, 3.1))
354-
355-
Indexing axes with monotonic decreasing labels also works, as long as the
356-
``slice`` or ``.loc`` arguments are also decreasing:
357-
358-
.. ipython:: python
359-
360-
reversed_data = data[::-1]
361-
reversed_data.loc[3.1:0.9]
362-
363-
Masking with ``where``
364-
----------------------
365-
366-
Indexing methods on xray objects generally return a subset of the original data.
367-
However, it is sometimes useful to select an object with the same shape as the
368-
original data, but with some elements masked. To do this type of selection in
369-
xray, use :py:meth:`~xray.DataArray.where`:
370-
371-
.. ipython:: python
372-
373-
arr = xray.DataArray(np.arange(16).reshape(4, 4), dims=['x', 'y'])
374-
arr.where(arr.x + arr.y < 4)
375-
376-
This is particularly useful for ragged indexing of multi-dimensional data,
377-
e.g., to apply a 2D mask to an image. Note that ``where`` follows all the
378-
usual xray broadcasting and alignment rules for binary operations (e.g.,
379-
``+``) between the object being indexed and the condition, as described in
380-
:ref:`comput`:
381-
382-
.. ipython:: python
383-
384-
arr.where(arr.y < 2)
385-
386-
Multi-dimensional indexing
387-
--------------------------
388-
389-
Xray does not yet support efficient routines for generalized multi-dimensional
390-
indexing or regridding. However, we are definitely interested in adding support
391-
for this in the future (see :issue:`475` for the ongoing discussion).

doc/whats-new.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ v0.5.3 (unreleased)
1414

1515
- Variables in netCDF files with multiple missing values are now decoded as NaN
1616
after issuing a warning if open_dataset is called with mask_and_scale=True.
17-
17+
- We clarified our rules for when the result from an xray operation is a copy
18+
vs. a view (see :ref:`copies vs views` for more details).
1819
- Dataset variables are now written to netCDF files in order of appearance
1920
when using the netcdf4 backend (:issue:`479`).
2021
- Added :py:meth:`~xray.Dataset.isel_points` and :py:meth:`~xray.DataArray.isel_points` to support pointwise indexing of Datasets and DataArrays (:issue:`475`).

xray/core/indexing.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,12 @@ def convert_label_indexer(index, label, index_name='', method=None):
140140
indexer = index.slice_indexer(_try_get_item(label.start),
141141
_try_get_item(label.stop),
142142
_try_get_item(label.step))
143+
if not isinstance(indexer, slice):
144+
# unlike pandas, in xray we never want to silently convert a slice
145+
# indexer into an array indexer
146+
raise KeyError('cannot represent labeled-based slice indexer for '
147+
'dimension %r with a slice over integer positions; '
148+
'the index is unsorted or non-unique')
143149
else:
144150
label = np.asarray(label)
145151
if label.ndim == 0:
@@ -149,8 +155,8 @@ def convert_label_indexer(index, label, index_name='', method=None):
149155
else:
150156
indexer = index.get_indexer(label, method=method)
151157
if np.any(indexer < 0):
152-
raise ValueError('not all values found in index %r'
153-
% index_name)
158+
raise KeyError('not all values found in index %r'
159+
% index_name)
154160
return indexer
155161

156162

xray/test/test_indexing.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,11 +70,19 @@ def test_orthogonal_indexer(self):
7070
def test_convert_label_indexer(self):
7171
# TODO: add tests that aren't just for edge cases
7272
index = pd.Index([1, 2, 3])
73-
with self.assertRaisesRegexp(ValueError, 'not all values found'):
73+
with self.assertRaisesRegexp(KeyError, 'not all values found'):
7474
indexing.convert_label_indexer(index, [0])
7575
with self.assertRaises(KeyError):
7676
indexing.convert_label_indexer(index, 0)
7777

78+
def test_convert_unsorted_datetime_index_raises(self):
79+
index = pd.to_datetime(['2001', '2000', '2002'])
80+
with self.assertRaises(KeyError):
81+
# pandas will try to convert this into an array indexer. We should
82+
# raise instead, so we can be sure the result of indexing with a
83+
# slice is always a view.
84+
indexing.convert_label_indexer(index, slice('2001', '2002'))
85+
7886
def test_remap_label_indexers(self):
7987
# TODO: fill in more tests!
8088
data = Dataset({'x': ('x', [1, 2, 3])})

0 commit comments

Comments
 (0)