Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documation: copy documenation is lacking detail for when deep=False #19505

Closed
charlie0389 opened this issue Feb 2, 2018 · 10 comments
Closed

Comments

@charlie0389
Copy link
Contributor

charlie0389 commented Feb 2, 2018

The DataFrame.copy documentation (for pandas 0.22.0) states:

deep : boolean or string, default True
Make a deep copy, including a copy of the data and the indices. 
With deep=False neither the indices or the data are copied.
Note that when deep=True data is copied, actual python objects 
will not be copied recursively, only the reference to the object. 
This is in contrast to copy.deepcopy in the Standard Library,
which recursively copies object data.

Problem description

The documentation does not state what is copied when deep=False (it doesn't copy data, it doesn't copy indices, what does it copy then?).

Searching for 'copy' suggests this has not been reported as an issue.

@ZhuBaohe
Copy link
Contributor

ZhuBaohe commented Feb 2, 2018

The reference of the data is copied with deep=False.

df=pd.DataFrame([1,2,3])

df1=df.copy(deep=False)

df1.iat[0,0]=100

df1.index=[10,20,30]

df1
Out[188]: 
      0
10  100
20    2
30    3

df
Out[189]: 
     0
0  100
1    2
2    3

@chris-b1
Copy link
Contributor

chris-b1 commented Feb 2, 2018

The DataFrame object itself is what is being copied, but the backing arrays aren't, as @ZhuBaohe shows. Certainly would take a doc PR clarifying if you're interested.

In [213]: df_1 = df

In [214]: df_1 is df
Out[214]: True

In [215]: df_2 = df.copy(deep=False)

In [216]: df_2 is df
Out[216]: False

In [217]: df_2.values.data == df_2.values.data
Out[217]: True

@chris-b1 chris-b1 added this to the Next Major Release milestone Feb 2, 2018
@MIsmailKhan
Copy link

Hi, I'd be happy to take this up!

Would this suffice?

"
deep : boolean or string, default True
Make a deep copy, including a copy of the data and the indices. With deep=False(i.e. shallow copy), the data reference is copied. Neither the indices or data are copied.
Note that when deep=True data is copied, actual python objects
will not be copied recursively, only the reference to the object.
This is in contrast to copy.deepcopy in the Standard Library,
which recursively copies object data.
"

@charlie0389
Copy link
Contributor Author

Ok, I'm a little confused. What does a copy of the reference to the object mean? Isn't this the same operation as tagging an object with a variable? i.e.
Isn't

df2 = df1

the same as

df2 = df1.copy(deep=False)

?
Obviously if I was right this would be a little redundant, so I assume I'm not. But I think the documentation could outline with an example of when deep=False is useful.

@MIsmailKhan
Copy link

I think this should cover the question:
https://stackoverflow.com/questions/46327494/python-pandas-dataframe-copydeep-false-vs-copydeep-true-vs

As for whether df2 = df1 and df2 =df1.copy(deep=False) are the same. I wrote up a couple of lines and checked what happened to the above if I modified the original database(i.e. df2). They both yielded the same output so I'm guessing that they are the same.

http://nbviewer.jupyter.org/gist/MIsmailKhan/b5e2534636e526d840c231cebcca5761

@charlie0389
Copy link
Contributor Author

Thankyou MismailKhan for the link and the example, its a bit clearer to me now. Unfortunately, neither of the StackOverflow answers articulate the difference between copy() and copy(deep=False) like your notebook example does.

However, as you acknowledge, it's still not clear why a copy(deep=False) operation is useful. It seems to me that if you are still operating on the same object, it's rather pointless just making a second reference to it - why not just use the same reference if the end result is the same...?

@MIsmailKhan
Copy link

MIsmailKhan commented Feb 6, 2018 via email

@charlie0389
Copy link
Contributor Author

Thanks MismailKhan - thankyou for clearing that up.

@eamag
Copy link

eamag commented Mar 20, 2018

So can we close this issue?

@jorisvandenbossche
Copy link
Member

This will be closed by #20261

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants