-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Adding vectorized indexing docs #4711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
keewis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks and welcome to xarray, @EricKeenan, this is definitely a valuable addition.
Could you also add these examples to the docstrings of DataArray.isel, Dataset.sel, and Dataset.isel?
@max-sixty, I think some time ago we were wondering if it was possible to link to the narrative documentation from docstrings. The documentation of the indexers parameter of sel contains such a link:
xarray/xarray/core/dataarray.py
Line 1162 in 7355c35
| carried out. See :ref:`indexing` for the details. |
but I guess this shouldn't be used too much because it looks strange with
pydoc.
xarray/core/dataarray.py
Outdated
| Examples | ||
| -------- | ||
| >>> ds = xr.tutorial.open_dataset("air_temperature") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually prefer having example data generated by np.arange or np.linspace (or by hand) because that makes it much easier to figure out what the example does. Also, this is the DataArray docstring so explicitly creating a DataArray might be better?
| >>> ds = xr.tutorial.open_dataset("air_temperature") | |
| >>> da = xr.DataArray( | |
| ... np.arange(20).reshape(5, 10), | |
| ... coords={"x": np.arange(5), "y": np.arange(10)}, | |
| ... dims=("x", "y"), | |
| ... ) |
The doctests / documentation fail because of the repr of the tutorial dataset (the "description" attribute contains a \n (newline) character), so with synthetic data we would avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip. I've implemented your suggestion in DataArray.sel. For DataArray.isel method='nearest doesn't appear to be an option? So I tried this, but it doesn't seem to work. I'm not sure why?
da = xr.DataArray(
np.arange(25).reshape(5, 5),
coords={"x": np.arange(5), "y": np.arange(5)},
dims=("x", "y")
)
tgt_x = xr.DataArray(np.linspace(0, 2, num=3), dims="points")
tgt_y = xr.DataArray(np.linspace(0, 2, num=3), dims="points")
da.isel(x=tgt_x, y=tgt_y) There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isel expects int, but linspace returns float. Since you're using it to generate indices, I would use arange instead (you could also use the dtype parameter of linspace).
Edit: yes, it seems isel does not support method (I guess that's to stay close to numpy indexing). Also, since this is isel, you don't need to pass coords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on building an example for Dataset.sel using method='nearest'
import numpy as np
import pandas as pd
import xarray as xr
ds = xr.Dataset(
data_vars=dict(
temperature=(["x", "y", "time"], np.arange(125).reshape(5, 5, 5))
),
coords=dict(
lon=(["x", "y"], np.arange(25).reshape(5, 5)),
lat=(["x", "y"], np.arange(25).reshape(5, 5)),
time=pd.date_range("2014-09-06", periods=5),
),
attrs=dict(description="Weather related data.")
)
# Define target latitude and longitude
tgt_x = xr.DataArray(np.arange(0, 5), dims="points")
tgt_y = xr.DataArray(np.arange(0, 5), dims="points")
# Dataset.sel
ds_sel = ds.sel(x=tgt_x, y=tgt_y, method='nearest')However I get the following error, which I don't understand. I assume my coordinate definitions are inconsistent?
Traceback (most recent call last):
File "/Users/erke2265/Documents/xarray/xarray/core/indexing.py", line 257, in remap_label_indexers
index = data_obj.indexes[dim]
File "/Users/erke2265/Documents/xarray/xarray/core/indexes.py", line 64, in __getitem__
return self._indexes[key]
KeyError: 'x'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tmp.py", line 24, in <module>
ds_sel = ds.sel(x=tgt_x, y=tgt_y, method='nearest')
File "/Users/erke2265/Documents/xarray/xarray/core/dataset.py", line 2275, in sel
self, indexers=indexers, method=method, tolerance=tolerance
File "/Users/erke2265/Documents/xarray/xarray/core/coordinates.py", line 398, in remap_label_indexers
obj, v_indexers, method=method, tolerance=tolerance
File "/Users/erke2265/Documents/xarray/xarray/core/indexing.py", line 262, in remap_label_indexers
"cannot supply ``method`` or ``tolerance`` "
ValueError: cannot supply ``method`` or ``tolerance`` when the indexed dimension does not have an associated coordinate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason is that you define lat and lon as 2D non-dimension coordinates and x and y as dimensions without coordinates. As the error says, sel with method or tolerance only works with dimensions with coordinates, so you need to add coordinates to x and y (actually, I would have expected sel to fail on dimensions without coordinates even without method or tolerance, so I'm a bit surprised the error message implies that could work).
This also means that you don't have to define lat and lon since you won't be using them in this example (it's fine to keep it if you want to use them in later examples, though).
doc/indexing.rst
Outdated
| Vectorized indexing may be used to extract information from the nearest | ||
| grid cells of interest, for example, the nearest climate model grid | ||
| cells to a collection specified weather station latitudes and longitudes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did tell you to modify the vectorized indexing section, but while reading #3768 I noticed that we already document that in the More advanced indexing section. Given that it uses pointwise indexing as its only example, maybe we should rename it to Pointwise indexing to make this easier to find?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like that section should be merged with "Vectorized Indexing" which could then be renamed to "Pointwise (or vectorized) Indexing)"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. Also, the term vectorized indexing is not really explained (unless I'm missing something, there's only a comment hinting at it's meaning), so it might be good to explicitly do that somewhere. This sounds like a lot of work, though, so it might be better to do that in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting discussion. At this point, I don't think I have much to add here. So I concur that this should be addressed in a different PR.
|
Hello @EricKeenan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-02-16 22:28:41 UTC |
mathause
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I would update the example in |
|
Looks great. Thanks @EricKeenan . I see this is your first PR, welcome to xarray! |
* upstream/master: FIX: h5py>=3 string decoding (pydata#4893) Update matplotlib's canonical (pydata#4919) Adding vectorized indexing docs (pydata#4711) Allow fsspec URLs in open_(mf)dataset (pydata#4823) Fix typos in example notebooks (pydata#4908) pre-commit autoupdate CI (pydata#4906) replace the ci-trigger action with a external one (pydata#4905) Update area_weighted_temperature.ipynb (pydata#4903) hide the decorator from the test traceback (pydata#4900) Sort backends (pydata#4886) Compatibility with dask 2021.02.0 (pydata#4884)
#4630: Adds a new vectorized indexing example to
seldocstring and narrative docs.Thanks to @dcherian for introducing me to vectorized indexing and @keewis for providing some information to get started. Also thanks to the community for the excellent contribution guide. http://xarray.pydata.org/en/stable/contributing.html
Am I missing anything here? Or is there anything that can be improved? I'm happy to see this through - thanks in advance for any feedback/tips!