Description
Is your feature request related to a problem?
I'm trying to use xindex
more. Currently, trying to select values using coordinates that haven't been explicitly indexed via set_xindex()
raises:
ds = xr.tutorial.open_dataset("air_temperature").assign_coords(lat2=lambda x: x.lat)
ds
# Output:
<xarray.Dataset> Size: 31MB
Dimensions: (lat: 25, time: 2920, lon: 53)
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
lat2 (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
# Attempting to select using the unindexed coordinate raises an error:
ds.sel(lat2=75)
# Output:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[20], line 1
----> 1 ds.sel(lat2=75)
File ~/workspace/xarray/xarray/core/dataset.py:3223, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
3155 """Returns a new dataset with each array indexed by tick labels
3156 along the specified dimension(s).
3157
(...)
3220
3221 """
3222 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 3223 query_results = map_index_queries(
3224 self, indexers=indexers, method=method, tolerance=tolerance
3225 )
3227 if drop:
3228 no_scalar_variables = {}
File ~/workspace/xarray/xarray/core/indexing.py:186, in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
183 options = {"method": method, "tolerance": tolerance}
185 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "map_index_queries")
--> 186 grouped_indexers = group_indexers_by_index(obj, indexers, options)
188 results = []
189 for index, labels in grouped_indexers:
File ~/workspace/xarray/xarray/core/indexing.py:145, in group_indexers_by_index(obj, indexers, options)
143 grouped_indexers[index_id][key] = label
144 elif key in obj.coords:
--> 145 raise KeyError(f"no index found for coordinate {key!r}")
146 elif key not in obj.dims:
147 raise KeyError(
148 f"{key!r} is not a valid dimension or coordinate for "
149 f"{obj.__class__.__name__} with dimensions {obj.dims!r}"
150 )
KeyError: "no index found for coordinate 'lat2'"
After explicitly setting the index, it works as expected:
ds.set_xindex('lat2').sel(lat2=75)
# Output:
<xarray.Dataset> Size: 1MB
Dimensions: (time: 2920, lon: 53)
Coordinates:
lat float32 4B 75.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
lat2 float32 4B 75.0
Data variables:
air (time, lon) float64 1MB ...
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
It's a bit annoying — frequently I attempt to select something, realize it doesn't have an index, add the .set_xindex
call, try and remember to add each one at object creation, feel like xarray isn't being as helpful as it could be.
Describe the solution you'd like
Could we instead set the xindex automatically when calling .sel
Possibly we want to force the user to create this once, rather than paying the cost of creating a new index on each call? But OTOH it seems relatively cheap?
%timeit ds.assign_coords(lat2=ds.lat + 2).set_xindex('lat2')
349 µs ± 6.97 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
(I guess it could be possible to update a cache in place, and then creating a new index from the cache would be very cheap. Though also possibly that's a source of quite confusing behavior if our implementation is in any way wrong / people are sharing objects across threads etc — i.e. the principle of "don't update in place" is useful)
Describe alternatives you've considered
A set_xindex(...)
param (i.e. literally an ellipsis ...
) that just creates all the indexes that it can, and folks could call after creating an object?
Additional context
No response