MultiIndex and data selection

[Edited for more clarity]

First of all, I find the MultiIndex very useful and I'm looking forward to see the TODOs in #719  implemented in the next releases, especially the three first ones in the list!

Apart from these issues, I think that some other aspects may be improved, notably regarding data selection. Or maybe I've not correctly understood how to deal with multi-index and data selection...

To illustrate this, I use some fake spectral data with two discontinuous bands of different length / resolution:

```
In [1]: import pandas as pd

In [2]: import xarray as xr

In [3]: band = np.array(['foo', 'foo', 'bar', 'bar', 'bar'])

In [4]: wavenumber = np.array([4050.2, 4050.3, 4100.1, 4100.3, 4100.5])

In [5]: spectrum = np.array([1.7e-4, 1.4e-4, 1.2e-4, 1.0e-4, 8.5e-5])

In [6]: s = pd.Series(spectrum, index=[band, wavenumber])

In [7]: s.index.names = ('band', 'wavenumber')

In [8]: da = xr.DataArray(s, dims='band_wavenumber')

In [9]: da
Out[9]:
<xarray.DataArray (band_wavenumber: 5)>
array([  1.70000000e-04,   1.40000000e-04,   1.20000000e-04,
         1.00000000e-04,   8.50000000e-05])
Coordinates:
  * band_wavenumber  (band_wavenumber) object ('foo', 4050.2) ...
```

I extract the band 'bar' using `sel`:

```
In [10]: da_bar = da.sel(band_wavenumber='bar')

In [11]: da_bar
Out[11]:
<xarray.DataArray (band_wavenumber: 3)>
array([  1.20000000e-04,   1.00000000e-04,   8.50000000e-05])
Coordinates:
  * band_wavenumber  (band_wavenumber) object ('bar', 4100.1) ...
```

It selects the data the way I want, although using the dimension name is confusing in this case. It would be nice if we can also use the `MultiIndex` names as arguments of the `sel` method, even though I don't know if it is easy to implement.

Futhermore, `da_bar` still has the 'band_wavenumber' dimension and the 'band' index-level, but it is not very useful anymore. Ideally, I'd rather like to obtain a `DataArray` object with a 'wavenumber' dimension / coordinate and the 'bar' band name dropped from the multi-index, i.e., something would require automatic index-level removal and/or automatic unstack when selecting data.

Extracting the band 'bar' from the pandas `Series` object gives something closer to what I need (see below), but using pandas is not an option as my spectral data involves other dimensions (e.g., time, scans, iterations...) not shown here for simplicity.

```
In [12]: s_bar = s.loc['bar']

In [13]: s_bar
Out[13]:
wavenumber
4100.1    0.000120
4100.3    0.000100
4100.5    0.000085
dtype: float64
```

The problem is also that the unstacked `DataArray` object resulting from the selection has the same dimensions and size than the original, unstacked `DataArray` object. The only difference is that unselected values are replaced by `nan`.

```
In [13]: da.unstack('band_wavenumber')
Out[13]:
<xarray.DataArray (band: 2, wavenumber: 5)>
array([[             nan,              nan,   1.20000000e-04,
          1.00000000e-04,   8.50000000e-05],
       [  1.70000000e-04,   1.40000000e-04,              nan,
                     nan,              nan]])
Coordinates:
  * band        (band) object 'bar' 'foo'
  * wavenumber  (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03

In [14]: da_bar.unstack('band_wavenumber')
Out[14]:
<xarray.DataArray (band: 2, wavenumber: 5)>
array([[             nan,              nan,   1.20000000e-04,
          1.00000000e-04,   8.50000000e-05],
       [             nan,              nan,              nan,
                     nan,              nan]])
Coordinates:
  * band        (band) object 'bar' 'foo'
  * wavenumber  (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MultiIndex and data selection #767

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

MultiIndex and data selection #767

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions