-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add .sel and .isel equivalent that returns all vars on all grids related to the indexers? #200
Comments
I think this would be a really nice addition to xgcm. I have been looking for this functionality for a long time to subset large datasets. This could also be a partial solution to e.g. the issue @miniufo mentioned in #193. I would propose that a call could look like this grid_sub, ds_sub = grid.isel(ds, x_c=3, y_c=slice(0,5), to=‘outer’, boundary=‘fill’) By using the actual dimension ( xgcm should set some reasonable defaults for Internally I’d like to wrap all of the isel/sel logic from xarray (profiting from possible future improvements) and just getting a logical index from there. Xgcm then would have to deal with constructing matching indicies for other dimensions along the same axis. |
Good idea. this would work well in combination with #197 (xarray accessor). Then we could do ds.grid.isel(X=slice(5, 10)) I slightly prefer @willirath's API over @jbusecke's. The whole point of this is to not have to worry about the different dimension names and be able to select on an axis. |
Fair point. Could we maybe add some more keyword to allow the flexibility if wanted? e.g. This would be hidden for the casual use, but still make the use case, I described possible? Would love to see this working with the accessor. What would the output of that call be? Just a dataset or both a new grid and ds? |
This lines up with the cf xarray discussion: pangeo-data/pangeo#771. |
If you have the EDIT: that's not exactly right. It won't take into account the grid layout. cf_xarray implements the grid-unaware version of this feature |
Oh this is super cool. I will try to have a closer look soon. Do you think this will overlap in functionality with xgcm? I will have to dive deeper into cf-xarray, but want to make sure that we are not duplicating efforts here. |
I am still very keen on having this feature. Just wanted to drop a usecase that I encountered the other day: |
This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution! |
This issue has been closed due to inactivity. If you feel this is in error, please reopen the issue or file a new issue with the relevant details. |
I think this is still relevant. Actually this might be interesting for your research project @jdldeauna. I think you wanted to subset a global model in a certain region? |
Yeah I do a lot of subsetting, so it would be great to integrate that with xgcm. :) |
I think this would be a great feature to add to xgcm! I had a few questions:
|
Hey @jdldeauna,
I envision the API to be something like this Lets say we have a setup like this along a single axis
I think
This to me is the most intuitive default because it creates velocity cells 'surrounding' the tracer. Of course this can be modified: Would results in:
These choices are then automatically parsed into the new grid object. I think this would be a huuuge first step to accomplish. The lon/lat selection could then maybe build on top of that, but this needs to work first IMO. |
This might be worth raising another issue? just to keep the discussion here focussed? |
I agree. let's do
I feel like this should be the default. i.e. preserve the positions. For your first case, shouldn't this be |
Thanks for laying this out Julius! And Deepak, would specifying that apply to both |
I read it as saying: take any variable currently on |
I think that is right. We just want to operate based on grid position. The variable can be whatever is located on that grid position, it should not matter for this feature. |
Hi! I'm trying using from xgcm import Grid
from xgcm.test.datasets import datasets_grid_metric
ds, coords, metrics = datasets_grid_metric("C")
grid = Grid(ds, coords=coords, metrics=metrics)
test = grid._ds.tracer.isel(xt=slice(1,2))
print(ds.tracer.coords['xt'])
print(test.coords['xt']) Output: Coordinates:
* xt (xt) int64 0 1 2 3
Coordinates:
* xt (xt) int64 1 To confirm, ideally this is what the grid_new = grid.isel({'X':slice(1,2)})
print(grid_new._ds.tracer.coords['xt'])
Coordinates:
* xt (xt) int64 1 2 # should include outer value? So how can the xgcm standard positions ( |
Hey @jdldeauna, I believe that this is the expected behavior when passing a slice object, and I would be hesitant to change this. I think you could pass a list of indicies instead For the more general question
As a start, I would try to come up with some experimental logic to convert all of these inputs to the desired output. E.g. some function like this (pseudo-code): def get_grid_indexer(grid, ds, axis_idx_dict, reference_position='center'):
""" grid is just a placeholder for self later, ds is an input dataset, axis_idx_dict is something like you outlined above {'X': slice(1,2), 'Y':4}
and reference_position is either a str or dict (per axis) that tells this function which dimension should be used for the initial selection (xt in your above example)"""
for axis, indexer in axis_idx_dict.items():
# check if axis in grid object
# figure out all the dimensions and grid positions in `ds` that are associated with this axis
dimensions, grid_positions = ....()
# loop over all positions and apply appropriate selection
for di, grid_position in zip(dimensions, grid_positions):
if grid_position == reference position:
# simplest case, just apply the selection as is
ds = ds.isel({dim:indexer}
else:
# Implement a 'translation' logic to convert `indexer` to other grid positions (needs to cover all possible combinations)
if reference_position == 'center' and grid_position == 'outer':
modified_indexer = ... # if you pass [1, 2] as the initial indexer you want to get something like [0,1,2] out here
# apply the selection
ds = ds.isel({dim: modified_indexer}) You will have to see how to make this work with all the inputs from above. Maybe xarrays internal logic has some things we could adapt. I am personally not very familiar with the indexing logic of xarray, but maybe @dcherian knows more and has some ideas how to make this easier (this is a pretty brute force approach but should work IMO). As a sidenote, I think we might want to disallow indexing with Dataarrays for now, and keep that for later? |
I see, thanks for the pseudo-code Julius!
Sure! I was trying it out as part of testing code. |
Here is some related work: https://xarray-subset-grid.readthedocs.io/en/latest/index.html |
I'm working on a project that aims at providing observation-like sampling for Ocean model data living on a C grid. What we aim at is keeping the C grid logic intact as long as possible. For this, something like
returning a full dataset restricted to
x_c=3
,x_l=3
, x_r=3,
y_c=5,
y_l=5,
y_r=5and a corresponding
ds.xgcm.sel(...)` would be ideal.A related question is: Which grid points are "related" to a given position?
If we consider the tracer grid of a finite volume model the basis of our selection, then for a complete description of the dynamics of the selection, we'd not only need the vars defined on the faces for one side (per dim), but for both sides of the box.
(Thanks @jbusecke for the chat yesterday. Definitely helped me getting a clearer idea on what I'm looking for.)
The text was updated successfully, but these errors were encountered: