-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Update pandas.Series.copy docstring #20261
DOC: Update pandas.Series.copy docstring #20261
Conversation
- Add extended summary - Add examples
pandas/core/generic.py
Outdated
b 2 | ||
dtype: int32 | ||
|
||
When copying object containing python objects, deep copy will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"an object"
pandas/core/generic.py
Outdated
@@ -4506,7 +4506,15 @@ def astype(self, dtype, copy=True, errors='raise', **kwargs): | |||
|
|||
def copy(self, deep=True): | |||
""" | |||
Make a copy of this objects data. | |||
Make a copy of this objects indices and data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of explicitness can you change "this objects" to "the calling object's" or "the caller's"
pandas/core/generic.py
Outdated
Make a copy of this objects data. | ||
Make a copy of this objects indices and data. | ||
|
||
Deep copy (default) will create a new object with a copy of the objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given these start to dive into the parameters and implementations I think they are better served in a dedicated Notes section
pandas/core/generic.py
Outdated
data and indices. Modifications to the data or indices of the copy | ||
will not be reflected in the original object. | ||
|
||
Shallow copy (``deep=False``) will create a new object without copying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single back ticks
pandas/core/generic.py
Outdated
Make a copy of this objects data. | ||
Make a copy of this objects indices and data. | ||
|
||
Deep copy (default) will create a new object with a copy of the objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object's
pandas/core/generic.py
Outdated
b 2 | ||
dtype: int32 | ||
|
||
Shallow copy versus default (deep) copy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a "section header" can you get it to render in bold (i.e. between asterisks)?
pandas/core/generic.py
Outdated
|
||
Shallow copy versus default (deep) copy: | ||
|
||
In a shallow copy the index and data is a reference to the original |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just "the index and data are shared with the original object"
pandas/core/generic.py
Outdated
|
||
Examples | ||
-------- | ||
>>> s = pd.Series([1,2], dtype="int32", index=["a", "b"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the dtype
argument provided because you want it to be clear that type is copied over and not automatically upcast to int64? I think that's what you are going for but it wasn't immediately obvious to me, so I'd try and make it more explicit if that is your intention
pandas/core/generic.py
Outdated
>>> s = pd.Series([1,2], dtype="int32", index=["a", "b"]) | ||
>>> deep_copy = s.copy() | ||
>>> shallow_copy = s.copy(deep=False) | ||
>>> id(s) == id(shallow_copy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These examples are certainly comprehensive but perhaps too verbose. Can you refactor to make the index / values comparisons fit on one line? Similar comment with the value assignment - any consolidation will make it more readable
pandas/core/generic.py
Outdated
When copying object containing python objects, deep copy will | ||
copy the indices and data but will not do so recursively. | ||
|
||
>>> class Point(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required to put a class in this docstring? Again for the sake of being concise wonder if you can just use lists to illustrate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I made this change plus the other requested changes.
I removed the wording/examples about the index not being copied when deep=False because while it's not copied when .copy() is called, if the index is changed on the shallow copy it does not affect the index of the original (in contrast with the data values, which are also not copied, but changes to values do affect the original). |
pandas/core/generic.py
Outdated
When `deep=False`, a new object will be created without copying | ||
the calling object's data (only a reference to the data is | ||
copied). Any changes to the data of the original will be reflected | ||
in the shallow copy (and vice versa). | ||
|
||
Parameters | ||
---------- | ||
deep : boolean or string, default True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`True` instead of just True
pandas/core/generic.py
Outdated
When `deep=False`, a new object will be created without copying | ||
the calling object's data (only a reference to the data is | ||
copied). Any changes to the data of the original will be reflected | ||
in the shallow copy (and vice versa). | ||
|
||
Parameters | ||
---------- | ||
deep : boolean or string, default True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str instead of string
pandas/core/generic.py
Outdated
will not be copied recursively, only the reference to the object. | ||
This is in contrast to ``copy.deepcopy`` in the Standard Library, | ||
which recursively copies object data. | ||
With `deep=False` neither the indices or the data are copied. | ||
|
||
Returns | ||
------- | ||
copy : type of caller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move "type of caller" down as the description and say something like "Object type matches caller."
pandas/core/generic.py
Outdated
|
||
>>> s = pd.Series([1,2], dtype="int32", index=["a", "b"]) | ||
>>> s = pd.Series([1,2], index=["a", "b"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space after first comma
pandas/core/generic.py
Outdated
|
||
When copying object containing python objects, deep copy will | ||
copy the indices and data but will not do so recursively. | ||
When copying an object containing python objects, deep copy will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize Python and maybe say "a deep copy"
pandas/core/generic.py
Outdated
>>> deep_copy = s.copy() | ||
>>> shallow_copy = s.copy(deep=False) | ||
>>> id(s) == id(shallow_copy) | ||
False | ||
>>> id(s.index) == id(shallow_copy.index) | ||
True | ||
>>> id(s.values) == id(shallow_copy.values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just again wondering if these can be more concise. Perhaps choosing a different variable name will allow you to do id(s.values) == id(shlw.values) and id(s.index) == id(shlw.values)
pandas/core/generic.py
Outdated
When `deep=True`, data is copied but actual python objects | ||
will not be copied recursively, only the reference to the object. | ||
This is in contrast to `copy.deepcopy` in the Standard Library, | ||
which recursively copies object data. (See examples below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...copies object data (see examples below)." - reduces some punctuation and saves space
pandas/core/generic.py
Outdated
will not be copied recursively, only the reference to the object. | ||
This is in contrast to ``copy.deepcopy`` in the Standard Library, | ||
which recursively copies object data. | ||
With `deep=False` neither the indices or the data are copied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"neither...nor" instead of "or"
pandas/core/generic.py
Outdated
When `deep=True` (default), a new object will be created with a | ||
copy of the calling object's data and indices. Modifications to | ||
the data or indices of the copy will not be reflected in the | ||
original object (see notes below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you want to have a Notes section I would say "see Examples below" as a reference to the section actually showing that
pandas/core/generic.py
Outdated
>>> deep_copy | ||
a 1 | ||
b 2 | ||
dtype: int32 | ||
dtype: int64 | ||
>>> shallow_copy[0] = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having two different assignments I think you can have s[0] = 3
and then this line both before printing any of the objects. Gets you the same result and covers the same concepts in less lines
@thismakessand not sure I fully understand why you removed the mention of indices. Perhaps its related to the conversation in #19862? Referencing that I think it's preferable to use Lot's of edits but they are mostly minor - thanks and keep up the great work! |
pandas/core/generic.py
Outdated
copy of the calling object's data and indices. Modifications to | ||
the data or indices of the copy will not be reflected in the | ||
original object (see notes below). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indices are immutable, mention this
pandas/core/generic.py
Outdated
a 1 | ||
b 2 | ||
dtype: int64 | ||
>>> s_copy = s.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line between cases (if they are separate / distinct)
pandas/core/generic.py
Outdated
False | ||
>>> id(s.values) == id(deep_copy.values) | ||
False | ||
>>> s[0] = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break this up with some text describing what you are showing
pandas/core/generic.py
Outdated
copy the data, but will not do so recursively. | ||
|
||
>>> s = pd.Series([[1, 2], [3, 4]]) | ||
>>> s_copy = s.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show this with a deep copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure how to address this comment - do you mean to show the behavior of
.copy(deep=True)
versus the std python copy.deepcopy()
? I was trying to show an example of how a (deep=True)
copy isn't necessarily a deep copy in the standard python sense, but maybe I should remove this example entirely since it's already mentioned as a caveat in the Notes section?
pandas/core/generic.py
Outdated
1 [3, 4] | ||
dtype: object | ||
|
||
For deep-copying python objects, the following can be used: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this last example is not necessary
Thanks - I added more clarification regarding the index and made the examples simpler and hopefully clearer. |
pandas/core/generic.py
Outdated
will not be copied recursively, only the reference to the object. | ||
This is in contrast to ``copy.deepcopy`` in the Standard Library, | ||
which recursively copies object data. | ||
With `deep=False` neither the indices nor the data are copied. | ||
|
||
Returns | ||
------- | ||
copy : type of caller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to clean up the line that says copy : type of caller
. I would explicitly list the objects that could be a caller of this function, which I assume would make this Series or DataFrame
but if there's anything else be sure to add that.
Otherwise nice job on all the edits - changes I requested all lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would make this "Series or DataFrame"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice docstring!
Added few more comments
pandas/core/generic.py
Outdated
|
||
Notes | ||
----- | ||
When `deep=True`, data is copied but actual Python objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks around deep=True
pandas/core/generic.py
Outdated
will not be copied recursively, only the reference to the object. | ||
This is in contrast to ``copy.deepcopy`` in the Standard Library, | ||
which recursively copies object data. | ||
With `deep=False` neither the indices nor the data are copied. | ||
|
||
Returns | ||
------- | ||
copy : type of caller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would make this "Series or DataFrame"
pandas/core/generic.py
Outdated
the calling object's data or index (only references to the data | ||
and index are copied). Any changes to the data of the original | ||
will be reflected in the shallow copy (and vice versa). Changes | ||
to the index will result in a new index as indices are immutable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line "Changes to the index will result in a new index as indices are immutable" is a bit out of place here, as this is not related to deep=False
For the fact that indices are immutable, I would add a short explanation about that in the Notes section (it is a bit too detailed for the extended summary at the beginning of the docstring).
The main thing is that, even for deep=True
, the index data (the underlying numpy array) is not copied, only the object (the Index class holding the array data). This is done for performance reasons: because Index is immutable, it is not needed to copy the data.
(see #19862 for discussion on it)
Codecov Report
@@ Coverage Diff @@
## master #20261 +/- ##
==========================================
+ Coverage 91.77% 91.84% +0.06%
==========================================
Files 152 152
Lines 49205 49231 +26
==========================================
+ Hits 45159 45215 +56
+ Misses 4046 4016 -30
Continue to review full report at Codecov.
|
@thismakessand Do you have time to update this PR based on the remaining feedback ? |
@jorisvandenbossche Yes, it's definitely still on my radar, hoping to find some time later today or tomorrow. |
I read through the linked issues and added the details you mentioned. Looking at the linked issues and other stuff on stack overflow, etc it seems like most of the confusion is around
I think both of these are covered now. Originally I had more details but I was worried going too into the caveats would just be unnecessarily confusing so I cut a bit out. I can add more if it's still unclear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates are looking good!
@thismakessand Thanks a lot! |
There is one remaining thing: the type description for |
Closes #19505
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.
Not sure if anything should be added to the See Also section