Description
What is your issue?
I know that generally, ds2 = ds
connects the two objects in memory, and changes in one will also cause changes in the other.
However, I generally assume that certain operations should break this connection, for example:
- extracting the underlying
np.array
from a dataset (changing its type and destroying a lot of the xarray-specific information: index, dimensions, etc.) - using the underlying
np.array
into a new dataset
In other words, I would expect that using ds['var'].values
would be similar to copy.deepcopy(ds['var'].values)
.
Here's an example that illustrates how in these cases, the objects are still linked in memory:
(apologies for the somewhat hokey example)
import xarray as xr
import numpy as np
# Create a dataset
ds = xr.Dataset(coords = {'lon':(['lon'],np.array([178.2,179.2,-179.8, -178.8,-177.8,-176.8]))})
print('\nds: ')
print(ds)
# Create a new dataset that uses the values of the first dataset
ds2 = xr.Dataset({'lon1':(['lon'],ds.lon.values)},
coords = {'lon':(['lon'],ds.lon.values)})
print('\nds2: ')
print(ds2)
# Change ds2's 'lon1' variable
ds2['lon1'][ds2['lon1']<0] = 360 + ds2['lon1'][ds2['lon1']<0]
# `ds2` is changed as expected
print('\nds2 (should be modified): ')
print(ds2)
# `ds` is changed, which is *not* expected
print('\nds (should not be modified): ')
print(ds)
The question is - am I right (from a UX perspective) to expect these kinds of operations to disconnect the objects in memory? If so, I might try to update the docs to be a bit clearer on this. (or, alternatively, if these kinds of operations should disconnect the objects in memory, maybe it's better to have .values
also call .copy(deep=True).values
)
Appreciate y'all's thoughts on this!