-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update docs on view / copies #8744
Conversation
- Add reference to numpy docs on view / copies in the corresponding section of the xarray docs, to help clarify pydata#8728 . - Add note that `da.values()` returns a view in the header for `da.values()`.
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
xarray/core/dataarray.py
Outdated
@@ -761,7 +761,7 @@ def data(self, value: Any) -> None: | |||
@property | |||
def values(self) -> np.ndarray: | |||
""" | |||
The array's data as a numpy.ndarray. | |||
A view onto the DataArray's underlying data as a numpy.ndarray. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm no expert here.)
Is this correct? I would have thought that it's returning the ndarray. And then that can get used as a view in many situations, consistent with numpy's rules.
If my sense is correct, we could just comment that it's not copied, and so mutations to the array will also affect the dataarray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification! How's this?
"The array's underlying numpy.ndarray
. Note that this array is not copied; operations on it follow numpy
's rules of what generates a view vs. a copy, and changes to this array will be reflected in the DataArray as well."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect I think!
(though to confirm, I'm not confident that the initial thing isn't so accurate, fine to wait for others to comment if we prefer...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what we're doing is roughly
return np.asarray(self._data)
(with slightly different behavior for 0d arrays with datetime64
or timedelta64
dtype)
This means that "The array's data converted to numpy.ndarray." would be a more accurate summary, but we could definitely emphasize the forwarding towards numpy.asarray
/ numpy.array
a bit more (the current "If the array's data is not a numpy.ndarray this will attempt to convert it naively using np.array()" is wrong, numpy.asarray
is always called).
We already have a note about non-castable arrays, so I'd probably append a warning about this not copying the data for numpy
(I don't think we can or should cover all the edge cases of numpy.asarray
here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks.
So let's merge after this is done...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's this?
"The array's data converted to numpy.ndarray. Note that this array is not copied; operations on it follow numpy's rules of what generates a view vs. a copy, and changes to this array may be reflected in the DataArray as well."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's two line breaks between the first and second sentences I don't have any objections (the first line is supposed to be a brief summary). Just make sure to coordinate the stuff that's already there:
If the array's data is not a numpy.ndarray this will attempt to convert it naively using np.array(), which will raise an error if the array type does not support coercion like this (e.g. cupy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - from your comment above, should "convert it naively using np.array()" just be changed to "convert it naively using np.asarray()" then? Or should that clause just be struck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the difference between asarray
and array
(I think it's different defaults), but in any case the important part is that the function is always applied to the data, not just when it's not already a numpy
array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhh okay - makes sense. I think the tweaks in the latest push should address that
I think the failing check is something that came from the latest main branch update, so once that gets resolved, I'll merge this barring any other comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, this looks good to me now.
The failing tests are flaky (not sure why only on python=3.11
, though), a rerun should fix that. I'll try once the MacOS run completed.
does anything else need to happen before it's cleared for merging btw? |
* main: (26 commits) [pre-commit.ci] pre-commit autoupdate (pydata#8900) Bump the actions group with 1 update (pydata#8896) New empty whatsnew entry (pydata#8899) Update reference to 'Weighted quantile estimators' (pydata#8898) 2024.03.0: Add whats-new (pydata#8891) Add typing to test_groupby.py (pydata#8890) Avoid in-place multiplication of a large value to an array with small integer dtype (pydata#8867) Check for aligned chunks when writing to existing variables (pydata#8459) Add dt.date to plottable types (pydata#8873) Optimize writes to existing Zarr stores. (pydata#8875) Allow multidimensional variable with same name as dim when constructing dataset via coords (pydata#8886) Don't allow overwriting indexes with region writes (pydata#8877) Migrate datatree.py module into xarray.core. (pydata#8789) warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend (pydata#8874) groupby: Dispatch quantile to flox. (pydata#8720) Opt out of auto creating index variables (pydata#8711) Update docs on view / copies (pydata#8744) Handle .oindex and .vindex for the PandasMultiIndexingAdapter and PandasIndexingAdapter (pydata#8869) numpy 2.0 copy-keyword and trapz vs trapezoid (pydata#8865) upstream-dev CI: Fix interp and cumtrapz (pydata#8861) ...
np.arrays
from datasets #8728 .da.values()
returns a view in the header forda.values()
.