-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lingering memory connections when extracting underlying np.arrays
from datasets
#8728
Comments
In general, you're expected to deep-copy explicitly to break these "links". This is the numpy paradigm |
If you want to read up on this, look for "view vs copy"! |
Yeah, I guess in this case from a legibility standpoint, the fact that Like I wouldn't expect the following two operations:
and
to behave the same. But I do understand that from the backend perspective, (relatedly, would it be worth it to link to the relevant numpy docs in this part of the xarray docs?) |
A related issue is that this allows you to (possibly inadvertently) circumvent certain xarray safeguards, like the
|
Yes! That would be a welcome contribution.
Yes. But I'm not sure there's much we can do about this. Our focus should be "if you use xarray operations, you won't get surprises"... |
Sounds good, I'll prep a PR |
- Add reference to numpy docs on view / copies in the corresponding section of the xarray docs, to help clarify pydata#8728 . - Add note that `da.values()` returns a view in the header for `da.values()`.
* Clarify #8728 in docs - Add reference to numpy docs on view / copies in the corresponding section of the xarray docs, to help clarify #8728 . - Add note that `da.values()` returns a view in the header for `da.values()`. * tweaks to the header * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flip order of new .to_values() doc header paragraphs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Resolved by #8744. |
What is your issue?
I know that generally,
ds2 = ds
connects the two objects in memory, and changes in one will also cause changes in the other.However, I generally assume that certain operations should break this connection, for example:
np.array
from a dataset (changing its type and destroying a lot of the xarray-specific information: index, dimensions, etc.)np.array
into a new datasetIn other words, I would expect that using
ds['var'].values
would be similar tocopy.deepcopy(ds['var'].values)
.Here's an example that illustrates how in these cases, the objects are still linked in memory:
(apologies for the somewhat hokey example)
The question is - am I right (from a UX perspective) to expect these kinds of operations to disconnect the objects in memory? If so, I might try to update the docs to be a bit clearer on this. (or, alternatively, if these kinds of operations should disconnect the objects in memory, maybe it's better to have
.values
also call.copy(deep=True).values
)Appreciate y'all's thoughts on this!
The text was updated successfully, but these errors were encountered: