-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding vectorized indexing docs #4711
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks and welcome to xarray, @EricKeenan, this is definitely a valuable addition.
Could you also add these examples to the docstrings of DataArray.isel
, Dataset.sel
, and Dataset.isel
?
@max-sixty, I think some time ago we were wondering if it was possible to link to the narrative documentation from docstrings. The documentation of the indexers
parameter of sel
contains such a link:
xarray/xarray/core/dataarray.py
Line 1162 in 7355c35
carried out. See :ref:`indexing` for the details. |
but I guess this shouldn't be used too much because it looks strange with
pydoc
.
xarray/core/dataarray.py
Outdated
@@ -1195,6 +1195,55 @@ def sel( | |||
Dataset.sel | |||
DataArray.isel | |||
|
|||
Examples | |||
-------- | |||
>>> ds = xr.tutorial.open_dataset("air_temperature") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually prefer having example data generated by np.arange
or np.linspace
(or by hand) because that makes it much easier to figure out what the example does. Also, this is the DataArray
docstring so explicitly creating a DataArray
might be better?
>>> ds = xr.tutorial.open_dataset("air_temperature") | |
>>> da = xr.DataArray( | |
... np.arange(20).reshape(5, 10), | |
... coords={"x": np.arange(5), "y": np.arange(10)}, | |
... dims=("x", "y"), | |
... ) |
The doctests / documentation fail because of the repr
of the tutorial dataset (the "description" attribute contains a \n
(newline) character), so with synthetic data we would avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip. I've implemented your suggestion in DataArray.sel
. For DataArray.isel
method='nearest
doesn't appear to be an option? So I tried this, but it doesn't seem to work. I'm not sure why?
da = xr.DataArray(
np.arange(25).reshape(5, 5),
coords={"x": np.arange(5), "y": np.arange(5)},
dims=("x", "y")
)
tgt_x = xr.DataArray(np.linspace(0, 2, num=3), dims="points")
tgt_y = xr.DataArray(np.linspace(0, 2, num=3), dims="points")
da.isel(x=tgt_x, y=tgt_y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isel
expects int
, but linspace
returns float
. Since you're using it to generate indices, I would use arange
instead (you could also use the dtype
parameter of linspace
).
Edit: yes, it seems isel
does not support method
(I guess that's to stay close to numpy
indexing). Also, since this is isel
, you don't need to pass coords
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on building an example for Dataset.sel
using method='nearest'
import numpy as np
import pandas as pd
import xarray as xr
ds = xr.Dataset(
data_vars=dict(
temperature=(["x", "y", "time"], np.arange(125).reshape(5, 5, 5))
),
coords=dict(
lon=(["x", "y"], np.arange(25).reshape(5, 5)),
lat=(["x", "y"], np.arange(25).reshape(5, 5)),
time=pd.date_range("2014-09-06", periods=5),
),
attrs=dict(description="Weather related data.")
)
# Define target latitude and longitude
tgt_x = xr.DataArray(np.arange(0, 5), dims="points")
tgt_y = xr.DataArray(np.arange(0, 5), dims="points")
# Dataset.sel
ds_sel = ds.sel(x=tgt_x, y=tgt_y, method='nearest')
However I get the following error, which I don't understand. I assume my coordinate definitions are inconsistent?
Traceback (most recent call last):
File "/Users/erke2265/Documents/xarray/xarray/core/indexing.py", line 257, in remap_label_indexers
index = data_obj.indexes[dim]
File "/Users/erke2265/Documents/xarray/xarray/core/indexes.py", line 64, in __getitem__
return self._indexes[key]
KeyError: 'x'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tmp.py", line 24, in <module>
ds_sel = ds.sel(x=tgt_x, y=tgt_y, method='nearest')
File "/Users/erke2265/Documents/xarray/xarray/core/dataset.py", line 2275, in sel
self, indexers=indexers, method=method, tolerance=tolerance
File "/Users/erke2265/Documents/xarray/xarray/core/coordinates.py", line 398, in remap_label_indexers
obj, v_indexers, method=method, tolerance=tolerance
File "/Users/erke2265/Documents/xarray/xarray/core/indexing.py", line 262, in remap_label_indexers
"cannot supply ``method`` or ``tolerance`` "
ValueError: cannot supply ``method`` or ``tolerance`` when the indexed dimension does not have an associated coordinate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason is that you define lat
and lon
as 2D non-dimension coordinates and x
and y
as dimensions without coordinates. As the error says, sel
with method
or tolerance
only works with dimensions with coordinates, so you need to add coordinates to x
and y
(actually, I would have expected sel
to fail on dimensions without coordinates even without method
or tolerance
, so I'm a bit surprised the error message implies that could work).
This also means that you don't have to define lat
and lon
since you won't be using them in this example (it's fine to keep it if you want to use them in later examples, though).
doc/indexing.rst
Outdated
Vectorized indexing may be used to extract information from the nearest | ||
grid cells of interest, for example, the nearest climate model grid | ||
cells to a collection specified weather station latitudes and longitudes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did tell you to modify the vectorized indexing section, but while reading #3768 I noticed that we already document that in the More advanced indexing
section. Given that it uses pointwise indexing as its only example, maybe we should rename it to Pointwise indexing
to make this easier to find?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like that section should be merged with "Vectorized Indexing" which could then be renamed to "Pointwise (or vectorized) Indexing)"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. Also, the term vectorized indexing
is not really explained (unless I'm missing something, there's only a comment hinting at it's meaning), so it might be good to explicitly do that somewhere. This sounds like a lot of work, though, so it might be better to do that in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting discussion. At this point, I don't think I have much to add here. So I concur that this should be addressed in a different PR.
Hello @EricKeenan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-02-16 22:28:41 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would update the example in |
Looks great. Thanks @EricKeenan . I see this is your first PR, welcome to xarray! |
* upstream/master: FIX: h5py>=3 string decoding (pydata#4893) Update matplotlib's canonical (pydata#4919) Adding vectorized indexing docs (pydata#4711) Allow fsspec URLs in open_(mf)dataset (pydata#4823) Fix typos in example notebooks (pydata#4908) pre-commit autoupdate CI (pydata#4906) replace the ci-trigger action with a external one (pydata#4905) Update area_weighted_temperature.ipynb (pydata#4903) hide the decorator from the test traceback (pydata#4900) Sort backends (pydata#4886) Compatibility with dask 2021.02.0 (pydata#4884)
#4630: Adds a new vectorized indexing example to
sel
docstring and narrative docs.Thanks to @dcherian for introducing me to vectorized indexing and @keewis for providing some information to get started. Also thanks to the community for the excellent contribution guide. http://xarray.pydata.org/en/stable/contributing.html
Am I missing anything here? Or is there anything that can be improved? I'm happy to see this through - thanks in advance for any feedback/tips!