Skip to content

unexpected positional indexing behavior with Dataset and DataArray "isel" #411

Closed
@jhamman

Description

@jhamman

I may be missing something here but I think the indexing behavior in isel is surprisingly different to that of numpy and is incongruent with the xray documentation. Either this is a bug or a feature that I don't understand.

From the xray docs on positional indexing:

Indexing a DataArray directly works (mostly) just like it does for numpy arrays, except that the returned object is always another DataArray

My example below uses two 1d numpy arrays to select from a 3d numpy array. When using pure numpy, I get a 2d array back. In my view, this is the expected behavior. When using the xray.Dataset or xray.DataArray, I get an oddly shaped 3d array back with a duplicate dimension.

import numpy as np
import xray
import sys

print('python version:', sys.version)
print('numpy version:', np.version.full_version)
print('xray version:', xray.version.version)
python version: 3.4.3 |Anaconda 2.2.0 (x86_64)| (default, Mar  6 2015, 12:07:41) 
[GCC 4.2.1 (Apple Inc. build 5577)]
numpy version: 1.9.2
xray version: 0.4.1
# A few numpy arrays
time = np.arange(100)
lons = np.arange(40, 60)
lats = np.arange(25, 70)
np_data = np.random.random(size=(len(time), len(lats), len(lons)))

# pick some random points to select
ys, xs = np.nonzero(np_data[0] > 0.8)
print(len(ys))
176
# create a xray.DataArray and xray.Dataset
xr_data = xray.DataArray(np_data, [('time', time), ('y', lats), ('x', lons)])  # DataArray
xr_ds = xr_data.to_dataset(name='data')  # Dataset

# numpy indexing 
print('numpy: ', np_data[:, ys, xs].shape)

# xray positional indexing
print('xray1: ', xr_data.isel(y=ys, x=xs).shape)
print('xray2: ', xr_data[:, ys, xs].shape)
print('xray3: ', xr_ds.isel(y=ys, x=xs))
numpy:  (100, 176)
xray1:  (100, 176, 176)
xray2:  (100, 176, 176)
xray3:  <xray.Dataset>
Dimensions:  (time: 100, x: 176, y: 176)
Coordinates:
  * x        (x) int64 46 47 57 45 48 50 51 54 57 59 48 52 49 50 52 53 55 57 43 46 47 48 53 ...
  * time     (time) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ...
  * y        (y) int64 25 25 25 26 26 26 26 26 26 26 27 27 28 28 28 28 28 28 29 29 29 29 29 ...
Data variables:
    data     (time, y, x) float64 0.9343 0.8311 0.8842 0.3188 0.02052 0.4506 0.04177 ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions