Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document deep parameter in ExtensionArray.copy #22314

Closed
TomAugspurger opened this issue Aug 13, 2018 · 11 comments
Closed

Document deep parameter in ExtensionArray.copy #22314

TomAugspurger opened this issue Aug 13, 2018 · 11 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 13, 2018

And pass it through via Series.copy.

diff --git a/pandas/tests/extension/decimal/array.py b/pandas/tests/extension/decimal/array.py
index 108b8874b..7085a231b 100644
--- a/pandas/tests/extension/decimal/array.py
+++ b/pandas/tests/extension/decimal/array.py
@@ -78,6 +78,7 @@ class DecimalArray(ExtensionArray, ExtensionScalarOpsMixin):
         return self._from_sequence(result)
 
     def copy(self, deep=False):
+        print(deep)
         if deep:
             return type(self)(self._data.copy())
         return type(self)(self)
In [1]: import pandas as pd; import numpy as np;

In [2]: from pandas.tests.extension.decimal.array import *

In [3]: s = pd.Series(DecimalArray([decimal.Decimal(1)]))

In [4]: s.copy(deep=False)
Out[4]:
0    1
dtype: decimal

In [5]: s.copy(deep=True)
False
Out[5]:
0    1
dtype: decimal
@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Aug 13, 2018
@TomAugspurger
Copy link
Contributor Author

We should also clarify what exactly a deep vs. shallow copy should mean...

My use-case is SparseArray.copy. Should the underlying data sp_index and sp_values be copied only if deep=True (that's what we do on master)?

@jorisvandenbossche
Copy link
Member

Should the underlying data sp_index and sp_values be copied only if deep=True (that's what we do on master)?

I suppose so.

We should also clarify what exactly a deep vs. shallow copy should mean...

The docstring of pd.Series.copy has a quite detailed explanation:

When deep=True (default), a new object will be created with a
copy of the calling object's data and indices. Modifications to
the data or indices of the copy will not be reflected in the
original object (see notes below).

When deep=False, a new object will be created without copying
the calling object's data or index (only references to the data
and index are copied). Any changes to the data of the original
will be reflected in the shallow copy (and vice versa).

(with of course the additional caveats regarding python objects not being copied recursively and index data not being copied).

For ExtensionArray it is of course not necessarily that straightforward, since there is no fixed "data" for all EAs. But I think typically an EA will have some kind of _data (or multiple objects), i.e. the actual stored data, so we can expect that to be copied with deep=True.
In practice, with deep=True we expect that doing array[0] = other_value does not change a deep copy of array, but does change a shallow copy of array.

@h-vetinari
Copy link
Contributor

Like in #19862, I would argue that:

IMO, copy(deep=True) should completely sever all connections between the original and the copied object - compare the official python docs (https://docs.python.org/3/library/copy.html):

A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

@jorisvandenbossche
Copy link
Member

@h-vetinari I think the other discussion is not fully relevant here, as an ExtensionArray has (in general) no notion of 'index' that is immutable and not copied like a Series/DataFrame has.

@TomAugspurger
Copy link
Contributor Author

This is also not tested directly in the base extension tests. Testing the semantics will be a tad, since we don't have a no-copy way of getting some kind of ExtensionArray._data.

@jorisvandenbossche
Copy link
Member

We could test the expected "user semantics" ?

a = EA(..)
a2 = a.copy(deep=True)
a[0] = a[1]
assert a2[0] != a[1]

a = EA(..)
a2 = a.copy(deep=False)
a[0] = a[1]
assert a2[0] == a[1]

@TomAugspurger
Copy link
Contributor Author

Ah, yeah that will work for mutable EAs (my head is in sparse-land right now, which is not mutable).

@jorisvandenbossche
Copy link
Member

Ah, OK, I now understand why you asked the initial question. In that case (SparseArray being immutable), then it is again more similar to the "copy index of Series or not" discussion.

Is there a reason they are immutable? (it's just more work to implement it? Scipy sparse matrices are mutable, but raise a performance warning if the mutation changes the sparsity structure)

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Aug 13, 2018 via email

@TomAugspurger TomAugspurger changed the title Series.copy should pass through deep to ExtensionArray.copy Document deep parameter in ExtensionArray.copy Nov 13, 2018
@jorisvandenbossche
Copy link
Member

Coming back to my comment above #22314 (comment) about expected user semantics.
I am not sure about that anymore, because I would now think to expect that any copy() call would make sure that modifying an array does not change a copy of it as well.

Another option is to simply remove the deep keyword for EAs (just like np.copy has no deep keyword)

@jbrockmendel
Copy link
Member

Closed by #27083.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

4 participants