Skip to content

Need a way to speciefy the names of coordinates from the indices which droped by DataArray.reset_index. #5874

Closed
@weipeng1999

Description

@weipeng1999

When I try to use some different coordinates as the index of a dim, I notice the new API on v0.9 provided by DataArray.set_index:

>>> import numpy as np
>>> import xarray as xr
>>> arr = xr.DataArray(np.r_[:4].reshape(4,4),dims=('t','x'))
>>> arr['x_str'] = "x",["a","b","c","d"]
>>> arr["x_num"] = "x",[1,2,3,4]; arr
<xarray.DataArray (t: 4, x: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Coordinates:
    x_str    (x) <U1 'a' 'b' 'c' 'd'
    x_num    (x) int64 1 2 3 4
Dimensions without coordinates: t, x
>>> arr.set_index(x='x_str'); arr
<xarray.DataArray (t: 4, x: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Coordinates:
    x_num    (x) int64 1 2 3 4
  * x        (x) object 'a' 'b' 'c' 'd'
Dimensions without coordinates: t
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...

That's really convenient, what a nice API.
But when I want to switch to another coordinate, I found that I can not recovery my arr to the version before I using :

>>> arr=arr.reset_index('x'); arr # why the croodinate to used as index now lose its name "x_str"? 
<xarray.DataArray (t: 4, x: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Coordinates:
    x_num    (x) int64 1 2 3 4
    x_       (x) object 'a' 'b' 'c' 'd'
Dimensions without coordinates: t, x
>>> arr=arr.set_index(x="x_num");arr # anyway, continue going to the code use coordinate "x_num" as index
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr=arr.reset_index('x');arr # now I need "x_str" coordinate as index, here we go
<xarray.DataArray (t: 4, x: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Coordinates:
    x_       (x) int64 1 2 3 4
Dimensions without coordinates: t, x
>>> #NOOP!!! the "x_int" coordinate COVER the "x_str" coordinate, I can't access the later any more :(

To solve this problem, I get following ways:

  • Add new function to do this.
    • may be most directly way.
    • add new function to implement and new API to design.
  • stop the DataArray.set_index to change coordinate name
    • The benefit is that without new function, its the most convenient one.
    • But it may change the calling-promise of DataArray.set_index, so it may be useless.
    • Here are some example to show how it can be used:
>>> arr = arr.set_index({'x':'x_str'})
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...
>>> arr = arr.set_index({'x':'x_int'})
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr = arr.set_index({'x':'x_str'}) ...
  • store the coordinate name in DataArray when calling "DataArray.set_index" and recovery them when calling "DataArray.reset_index"
    • Just a bit more complex than previews one.
    • But it need to add redundant data inner DataArray, useless too.
    • Example:
>>> arr = arr.set_index({'x':'x_str'})
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...
>>> arr = arr.reset_index('x').set_index({'x':'x_int'})
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr = arr.reset_index('x').set_index({'x':'x_str'}) ...
  • let DataArray.reset_index support Mapping as names parameters, while use the keys as dims to reset indices and the value as the names of coordinates converted from those indices.
    • More complex.
    • Maybe the one cause least change, so I prefer it.
    • Example:
>>> arr = arr.set_index({'x':'x_str'})
>>> #... some code with "arr[{'x':[... some str index ...]}]" ...
>>> arr = arr.reset_index({'x':'x_str'}).set_index({'x':'x_int'})
>>> #... some code with "arr[{'x':[... some int index ...]}]" ...
>>> arr = arr.reset_index({'x':'x_int'}).set_index({'x':'x_str'}) ...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions