Closed
Description
When I try to use some different coordinates as the index of a dim, I notice the new API on v0.9 provided by DataArray.set_index:
>>> import numpy as np
>>> import xarray as xr
>>> arr = xr.DataArray(np.r_[:4].reshape(4,4),dims=('t','x'))
>>> arr['x_str'] = "x",["a","b","c","d"]
>>> arr["x_num"] = "x",[1,2,3,4]; arr
<xarray.DataArray (t: 4, x: 4)>
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Coordinates:
x_str (x) <U1 'a' 'b' 'c' 'd'
x_num (x) int64 1 2 3 4
Dimensions without coordinates: t, x
>>> arr.set_index(x='x_str'); arr
<xarray.DataArray (t: 4, x: 4)>
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Coordinates:
x_num (x) int64 1 2 3 4
* x (x) object 'a' 'b' 'c' 'd'
Dimensions without coordinates: t
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...
That's really convenient, what a nice API.
But when I want to switch to another coordinate, I found that I can not recovery my arr to the version before I using :
>>> arr=arr.reset_index('x'); arr # why the croodinate to used as index now lose its name "x_str"?
<xarray.DataArray (t: 4, x: 4)>
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Coordinates:
x_num (x) int64 1 2 3 4
x_ (x) object 'a' 'b' 'c' 'd'
Dimensions without coordinates: t, x
>>> arr=arr.set_index(x="x_num");arr # anyway, continue going to the code use coordinate "x_num" as index
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr=arr.reset_index('x');arr # now I need "x_str" coordinate as index, here we go
<xarray.DataArray (t: 4, x: 4)>
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Coordinates:
x_ (x) int64 1 2 3 4
Dimensions without coordinates: t, x
>>> #NOOP!!! the "x_int" coordinate COVER the "x_str" coordinate, I can't access the later any more :(
To solve this problem, I get following ways:
- Add new function to do this.
- may be most directly way.
- add new function to implement and new API to design.
- stop the DataArray.set_index to change coordinate name
- The benefit is that without new function, its the most convenient one.
- But it may change the calling-promise of DataArray.set_index, so it may be useless.
- Here are some example to show how it can be used:
>>> arr = arr.set_index({'x':'x_str'})
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...
>>> arr = arr.set_index({'x':'x_int'})
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr = arr.set_index({'x':'x_str'}) ...
- store the coordinate name in DataArray when calling "DataArray.set_index" and recovery them when calling "DataArray.reset_index"
- Just a bit more complex than previews one.
- But it need to add redundant data inner DataArray, useless too.
- Example:
>>> arr = arr.set_index({'x':'x_str'})
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...
>>> arr = arr.reset_index('x').set_index({'x':'x_int'})
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr = arr.reset_index('x').set_index({'x':'x_str'}) ...
- let DataArray.reset_index support Mapping as names parameters, while use the keys as dims to reset indices and the value as the names of coordinates converted from those indices.
- More complex.
- Maybe the one cause least change, so I prefer it.
- Example:
>>> arr = arr.set_index({'x':'x_str'})
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...
>>> arr = arr.reset_index({'x':'x_str'}).set_index({'x':'x_int'})
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr = arr.reset_index({'x':'x_int'}).set_index({'x':'x_str'}) ...