Add .sel and .isel equivalent that returns all vars on all grids related to the indexers? #200

willirath · 2020-04-30T12:39:35Z

I'm working on a project that aims at providing observation-like sampling for Ocean model data living on a C grid. What we aim at is keeping the C grid logic intact as long as possible. For this, something like

ds.xgcm.isel(X=3, Y=5)

returning a full dataset restricted to x_c=3, x_l=3, x_r=3, y_c=5, y_l=5, y_r=5and a correspondingds.xgcm.sel(...)` would be ideal.

A related question is: Which grid points are "related" to a given position?

If we consider the tracer grid of a finite volume model the basis of our selection, then for a complete description of the dynamics of the selection, we'd not only need the vars defined on the faces for one side (per dim), but for both sides of the box.

(Thanks @jbusecke for the chat yesterday. Definitely helped me getting a clearer idea on what I'm looking for.)

The text was updated successfully, but these errors were encountered:

jbusecke · 2020-04-30T13:15:51Z

I think this would be a really nice addition to xgcm. I have been looking for this functionality for a long time to subset large datasets. This could also be a partial solution to e.g. the issue @miniufo mentioned in #193.

I would propose that a call could look like this

grid_sub, ds_sub = grid.isel(ds, x_c=3, y_c=slice(0,5), to=‘outer’, boundary=‘fill’)

By using the actual dimension (x_c) and a specifier (not sure if to is the most intuitive), the user has maximum flexibility.
In your case you want to select based on tracer points but get both neighboring velocity points. ‘outer’ will do the trick and also keep existing xgcm lingo for clarity.
If for some reason, another user wants to select based on velocity points, and the select only the tracer point to the left, that is also possible.

xgcm should set some reasonable defaults for to and boundary based on the original grid object to keep a simple sub setting command short.

Internally I’d like to wrap all of the isel/sel logic from xarray (profiting from possible future improvements) and just getting a logical index from there. Xgcm then would have to deal with constructing matching indicies for other dimensions along the same axis.

rabernat · 2020-04-30T13:26:32Z

Good idea. this would work well in combination with #197 (xarray accessor). Then we could do

ds.grid.isel(X=slice(5, 10))

I slightly prefer @willirath's API over @jbusecke's. The whole point of this is to not have to worry about the different dimension names and be able to select on an axis.

jbusecke · 2020-04-30T13:33:14Z

Fair point. Could we maybe add some more keyword to allow the flexibility if wanted? e.g. from could default to center but also be left/right?

This would be hidden for the casual use, but still make the use case, I described possible?

Would love to see this working with the accessor. What would the output of that call be? Just a dataset or both a new grid and ds?

dcherian · 2020-04-30T14:44:24Z

This lines up with the cf xarray discussion: pangeo-data/pangeo#771.

dcherian · 2020-06-14T17:29:32Z

If you have the axis attribute set, you can now do this with cf-xarray: https://cf-xarray.readthedocs.io/en/latest/examples/introduction.html#Slicing

EDIT: that's not exactly right. It won't take into account the grid layout. cf_xarray implements the grid-unaware version of this feature

jbusecke · 2020-06-14T23:56:54Z

Oh this is super cool. I will try to have a closer look soon. Do you think this will overlap in functionality with xgcm? I will have to dive deeper into cf-xarray, but want to make sure that we are not duplicating efforts here.

jbusecke · 2021-01-20T19:47:45Z

I am still very keen on having this feature. Just wanted to drop a usecase that I encountered the other day:
It would be awesome if xgcm.sel() could retroactively figure out how to select fields on other grid locations. Say I have a tracer field and selected a subset, I think it would be amazing if there would be a way to select the velocity fields in a coherent way. This might however be a bit more complicated, depending on how the dimension labels are given. Just wanted to drop this here, so it doesnt get lost.

dcherian · 2021-03-05T16:43:45Z

See pydata/xarray#4979 (comment)

github-actions · 2021-06-11T12:03:20Z

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!

github-actions · 2021-06-18T12:03:28Z

This issue has been closed due to inactivity. If you feel this is in error, please reopen the issue or file a new issue with the relevant details.

jbusecke · 2021-06-24T15:29:33Z

I think this is still relevant. Actually this might be interesting for your research project @jdldeauna. I think you wanted to subset a global model in a certain region?

jdldeauna · 2021-06-24T16:40:36Z

Yeah I do a lot of subsetting, so it would be great to integrate that with xgcm. :)

jdldeauna · 2021-07-29T18:58:19Z

I think this would be a great feature to add to xgcm! I had a few questions:

To clarify, this would be a method under the Grid class (i.e., usage would be grid.sel() and/or grid.isel())?
Maybe we could also consider using lat/lon as inputs for defining subset limits (aside from axis indices)? But then make sure that the right lat/lon grids are being accessed. So for example, grid.sel(ds.uvelocity, lat=[25,40]) would automatically access lat_u, while tracers would access defaults (grid.sel(ds.temperature,lat=[25,40]) would use lat). I realize this would involve using a lot of xarray methods "under the hood" but I think its worth making it more intuitive for simple-case users (like me 😅 ).
There is an available test for using GCMs (test_gcm_dataset.py) but it's marked as deprecated. Could it still be used, and is the GCMDataset still available? Alternatively, is it advisable to write tests which use intake to download GCM datasets for testing, or would that add complexity we wouldn't want?

jbusecke · 2021-07-30T15:32:09Z

Hey @jdldeauna,

Yes I think this should live in the grid object as a method
I think it would be even more powerful if we can implement this on a full dataset (see example below). Regarding the lat/lon selection: I am not sure how to even implement a general lon/lat selection without a mask. On a regular lon/lat grid this is super easy, and you can figure out the indicies before passing them to grid.isel() but once you have curvilinear coordinates there have to be 'gaps' in your resulting data.

I envision the API to be something like this ds_new, grid_new = grid.isel({'X':slice(a,b), 'Y':slice(c,d)}). I think the default should always apply the selection based on the center position and select all other positions to result in an outer:

Lets say we have a setup like this along a single axis X

Tracer(center)
 |------o-------|------o-------|------o-------|------o-------|
       [0]            [1]            [2]            [3]

Velocity(left)
 |------o-------|------o-------|------o-------|------o-------|
[0]            [1]            [2]            [3]

I think grid.isel({'X':slice(1,2)}) should return:

Tracer(center)
 |------o-------|------o-------|------o-------|------o-------|
                      [1]            [2]            

Velocity(outer)
 |------o-------|------o-------|------o-------|------o-------|
               [1]            [2]            [3]

This to me is the most intuitive default because it creates velocity cells 'surrounding' the tracer.

Of course this can be modified:
grid.isel({'X':slice(1,2)}, pos_mapping={'center':'center', 'left':'left'})

Would results in:

Tracer(center)
 |------o-------|------o-------|------o-------|------o-------|
                      [1]            [2]            

Velocity(left)
 |------o-------|------o-------|------o-------|------o-------|
               [1]            [2]

These choices are then automatically parsed into the new grid object.

I think this would be a huuuge first step to accomplish. The lon/lat selection could then maybe build on top of that, but this needs to work first IMO.

jbusecke · 2021-07-30T15:35:06Z

There is an available test for using GCMs (test_gcm_dataset.py) but it's marked as deprecated. Could it still be used, and is the GCMDataset still available? Alternatively, is it advisable to write tests which use intake to download GCM datasets for testing, or would that add complexity we wouldn't want?

This might be worth raising another issue? just to keep the discussion here focussed?

dcherian · 2021-07-30T17:10:13Z

I agree. let's do isel first. sel can then call isel.

pos_mapping={'center':'center', 'left':'left'}

I feel like this should be the default. i.e. preserve the positions. For your first case, shouldn't this be {"X": {"left": "outer"}} i.e. we should have to specify this per-axis and positions that don't change need not be specified.

jdldeauna · 2021-07-30T18:18:54Z

Thanks for laying this out Julius! And Deepak, would specifying that apply to both velocity and tracer?

dcherian · 2021-07-30T19:35:07Z

I read it as saying: take any variable currently on x: left and make it x: outer on the returned grid and dataset.

jbusecke · 2021-08-02T17:28:00Z

I think that is right. We just want to operate based on grid position. The variable can be whatever is located on that grid position, it should not matter for this feature.

jdldeauna · 2021-09-07T06:06:07Z

Hi! I'm trying using isel with a test grid object:

from xgcm import Grid
from xgcm.test.datasets import datasets_grid_metric

ds, coords, metrics = datasets_grid_metric("C")
grid = Grid(ds, coords=coords, metrics=metrics)
test = grid._ds.tracer.isel(xt=slice(1,2))

print(ds.tracer.coords['xt'])
print(test.coords['xt'])

Output:

Coordinates:
  * xt       (xt) int64 0 1 2 3
Coordinates:
  * xt       (xt) int64 1

To confirm, ideally this is what the grid.isel method should be able do:

grid_new = grid.isel({'X':slice(1,2)})
print(grid_new._ds.tracer.coords['xt'])
Coordinates:
  * xt       (xt) int64 1 2 # should include outer value?

So how can the xgcm standard positions ({"X": {"left": "outer"}}) be combined with the xarray isel and the slice functions when subsetting variables?

jbusecke · 2021-09-07T16:52:16Z

Hey @jdldeauna, I believe that this is the expected behavior when passing a slice object, and I would be hesitant to change this. I think you could pass a list of indicies instead ...isel(xt=[1,2]) to achieve what you want here.

For the more general question So how can the xgcm standard positions ({"X": {"left": "outer"}}) be combined with the xarray isel and the slice functions when subsetting variables? We should consider all the possible inputs for .isel in xarray. From the xarray docs we get:

A dict with keys matching dimensions and values given by integers, slice objects or arrays. indexer can be a integer, slice, array-like or DataArray.

As a start, I would try to come up with some experimental logic to convert all of these inputs to the desired output. E.g. some function like this (pseudo-code):

def get_grid_indexer(grid, ds, axis_idx_dict, reference_position='center'):
    """ grid is just a placeholder for self later, ds is an input dataset, axis_idx_dict is something like you outlined above {'X': slice(1,2), 'Y':4}
    and reference_position is either a str or dict (per axis) that tells this function which dimension should be used for the initial selection (xt in your above example)"""
    for axis, indexer in axis_idx_dict.items():
         # check if axis in grid object
         # figure out all the dimensions and grid positions in `ds` that are associated with this axis
         dimensions, grid_positions = ....()
         # loop over all positions and apply appropriate selection
         for di, grid_position in zip(dimensions, grid_positions):
               if grid_position == reference position:
                     # simplest case, just apply the selection as is
                     ds = ds.isel({dim:indexer}
               else:
                     # Implement a 'translation' logic to convert `indexer` to other grid positions (needs to cover all possible combinations)
                     if reference_position == 'center' and grid_position == 'outer':
                          modified_indexer = ... # if you pass [1, 2] as the initial indexer you want to get something like [0,1,2] out here
                          # apply the selection
                          ds = ds.isel({dim: modified_indexer})

You will have to see how to make this work with all the inputs from above. Maybe xarrays internal logic has some things we could adapt. I am personally not very familiar with the indexing logic of xarray, but maybe @dcherian knows more and has some ideas how to make this easier (this is a pretty brute force approach but should work IMO).

As a sidenote, I think we might want to disallow indexing with Dataarrays for now, and keep that for later?

jdldeauna · 2021-09-08T17:29:39Z

I see, thanks for the pseudo-code Julius!

As a sidenote, I think we might want to disallow indexing with Dataarrays for now, and keep that for later?

Sure! I was trying it out as part of testing code.

dcherian · 2024-08-26T19:24:56Z

Here is some related work: https://xarray-subset-grid.readthedocs.io/en/latest/index.html

dcherian mentioned this issue Jun 3, 2020

What to do with multiple axis variables? xarray-contrib/cf-xarray#13

Closed

willirath mentioned this issue Jun 27, 2020

Implement 2d sel via Xarray accessor? xarray-contrib/xoak#1

Closed

jbusecke added the enhancement label Sep 3, 2020

jbusecke mentioned this issue Oct 14, 2020

Make _assign_metrics public #199

Closed

dcherian mentioned this issue Mar 4, 2021

Flexible indexes refactoring notes pydata/xarray#4979

Merged

dcherian mentioned this issue May 17, 2021

Internal refactor of label-based data selection pydata/xarray#5322

Merged

github-actions bot added the Stale label Jun 11, 2021

github-actions bot closed this as completed Jun 18, 2021

jbusecke reopened this Jun 24, 2021

jbusecke added keepOpen and removed Stale labels Jun 24, 2021

jbusecke mentioned this issue Jul 20, 2021

apply Grid.average to subset of data #193

Closed

jdldeauna mentioned this issue Jul 30, 2021

[BUG] test_gcm_dataset.py is deprecated #365

Closed

jbusecke mentioned this issue Oct 31, 2022

Roadmap for ROMS/CROCO analysis using xarray/xgcm xoceanmodel/xroms#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add .sel and .isel equivalent that returns all vars on all grids related to the indexers? #200

Add .sel and .isel equivalent that returns all vars on all grids related to the indexers? #200

willirath commented Apr 30, 2020

jbusecke commented Apr 30, 2020 •

edited

Loading

rabernat commented Apr 30, 2020

jbusecke commented Apr 30, 2020

dcherian commented Apr 30, 2020

dcherian commented Jun 14, 2020 •

edited

Loading

jbusecke commented Jun 14, 2020

jbusecke commented Jan 20, 2021

dcherian commented Mar 5, 2021

github-actions bot commented Jun 11, 2021

github-actions bot commented Jun 18, 2021

jbusecke commented Jun 24, 2021

jdldeauna commented Jun 24, 2021

jdldeauna commented Jul 29, 2021

jbusecke commented Jul 30, 2021

jbusecke commented Jul 30, 2021

dcherian commented Jul 30, 2021

jdldeauna commented Jul 30, 2021

dcherian commented Jul 30, 2021

jbusecke commented Aug 2, 2021

jdldeauna commented Sep 7, 2021

jbusecke commented Sep 7, 2021

jdldeauna commented Sep 8, 2021

dcherian commented Aug 26, 2024

Add .sel and .isel equivalent that returns all vars on all grids related to the indexers? #200

Add .sel and .isel equivalent that returns all vars on all grids related to the indexers? #200

Comments

willirath commented Apr 30, 2020

jbusecke commented Apr 30, 2020 • edited Loading

rabernat commented Apr 30, 2020

jbusecke commented Apr 30, 2020

dcherian commented Apr 30, 2020

dcherian commented Jun 14, 2020 • edited Loading

jbusecke commented Jun 14, 2020

jbusecke commented Jan 20, 2021

dcherian commented Mar 5, 2021

github-actions bot commented Jun 11, 2021

github-actions bot commented Jun 18, 2021

jbusecke commented Jun 24, 2021

jdldeauna commented Jun 24, 2021

jdldeauna commented Jul 29, 2021

jbusecke commented Jul 30, 2021

jbusecke commented Jul 30, 2021

dcherian commented Jul 30, 2021

jdldeauna commented Jul 30, 2021

dcherian commented Jul 30, 2021

jbusecke commented Aug 2, 2021

jdldeauna commented Sep 7, 2021

jbusecke commented Sep 7, 2021

jdldeauna commented Sep 8, 2021

dcherian commented Aug 26, 2024

jbusecke commented Apr 30, 2020 •

edited

Loading

dcherian commented Jun 14, 2020 •

edited

Loading