Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Update pandas.Series.copy docstring #20261

Merged
merged 6 commits into from
Mar 23, 2018

Conversation

thismakessand
Copy link
Contributor

@thismakessand thismakessand commented Mar 10, 2018

Closes #19505

  • Add extended summary
  • Add examples

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:


################################################################################
######################## Docstring (pandas.Series.copy) ########################
################################################################################

Make a copy of this objects indices and data.

Deep copy (default) will create a new object with a copy of the objects
data and indices. Modifications to the data or indices of the copy
will not be reflected in the original object.

Shallow copy (``deep=False``) will create a new object without copying
the data or indices. Any changes to the index or data of the original
will be reflected in the shallow copy (and vice versa).

Parameters
----------
deep : boolean or string, default True
    Make a deep copy, including a copy of the data and the indices.
    With ``deep=False`` neither the indices or the data are copied.

    Note that when ``deep=True`` data is copied, actual python objects
    will not be copied recursively, only the reference to the object.
    This is in contrast to ``copy.deepcopy`` in the Standard Library,
    which recursively copies object data.

Returns
-------
copy : type of caller

Examples
--------
>>> s = pd.Series([1,2], dtype="int32", index=["a", "b"])
>>> s
a    1
b    2
dtype: int32
>>> s_copy = s.copy()
>>> s_copy
a    1
b    2
dtype: int32

Shallow copy versus default (deep) copy:

In a shallow copy the index and data is a reference to the original
objects'.

>>> s = pd.Series([1,2], dtype="int32", index=["a", "b"])
>>> deep_copy = s.copy()
>>> shallow_copy = s.copy(deep=False)
>>> id(s) == id(shallow_copy)
False
>>> id(s.index) == id(shallow_copy.index)
True
>>> id(s.values) == id(shallow_copy.values)
True
>>> id(s) == id(deep_copy)
False
>>> id(s.index) == id(deep_copy.index)
False
>>> id(s.values) == id(deep_copy.values)
False
>>> s[0] = 3
>>> s
a    3
b    2
dtype: int32
>>> shallow_copy
a    3
b    2
dtype: int32
>>> deep_copy
a    1
b    2
dtype: int32
>>> shallow_copy[0] = 4
>>> s
a    4
b    2
dtype: int32
>>> shallow_copy
a    4
b    2
dtype: int32
>>> deep_copy
a    1
b    2
dtype: int32

When copying object containing python objects, deep copy will
copy the indices and data but will not do so recursively.

>>> class Point(object):
...     def __init__(self, x, y):
...         self.x = x
...         self.y = y
...     def __repr__(self):
...         return "Point(%d,%d)" % (self.x, self.y)
...
>>> p1 = Point(0, 1)
>>> s = pd.Series([p1], dtype="object")
>>> s
0    Point(0,1)
dtype: object
>>> s_copy = s.copy()
>>> s_copy
0    Point(0,1)
dtype: object
>>> id(s[0]) == id(s_copy[0])
True
>>> p1.x = 1
>>> s
0    Point(1,1)
dtype: object
>>> s_copy
0    Point(1,1)
dtype: object
>>>

For deep-copying python objects, the following can be used:

>>> import copy
>>> deep_deep_copy = pd.Series(copy.deepcopy(s.values),
...                            index=copy.deepcopy(s.index))

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        See Also section not found

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

Not sure if anything should be added to the See Also section

- Add extended summary
- Add examples
b 2
dtype: int32

When copying object containing python objects, deep copy will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"an object"

@@ -4506,7 +4506,15 @@ def astype(self, dtype, copy=True, errors='raise', **kwargs):

def copy(self, deep=True):
"""
Make a copy of this objects data.
Make a copy of this objects indices and data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of explicitness can you change "this objects" to "the calling object's" or "the caller's"

Make a copy of this objects data.
Make a copy of this objects indices and data.

Deep copy (default) will create a new object with a copy of the objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given these start to dive into the parameters and implementations I think they are better served in a dedicated Notes section

data and indices. Modifications to the data or indices of the copy
will not be reflected in the original object.

Shallow copy (``deep=False``) will create a new object without copying
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single back ticks

Make a copy of this objects data.
Make a copy of this objects indices and data.

Deep copy (default) will create a new object with a copy of the objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object's

b 2
dtype: int32

Shallow copy versus default (deep) copy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a "section header" can you get it to render in bold (i.e. between asterisks)?


Shallow copy versus default (deep) copy:

In a shallow copy the index and data is a reference to the original
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just "the index and data are shared with the original object"


Examples
--------
>>> s = pd.Series([1,2], dtype="int32", index=["a", "b"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the dtype argument provided because you want it to be clear that type is copied over and not automatically upcast to int64? I think that's what you are going for but it wasn't immediately obvious to me, so I'd try and make it more explicit if that is your intention

>>> s = pd.Series([1,2], dtype="int32", index=["a", "b"])
>>> deep_copy = s.copy()
>>> shallow_copy = s.copy(deep=False)
>>> id(s) == id(shallow_copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples are certainly comprehensive but perhaps too verbose. Can you refactor to make the index / values comparisons fit on one line? Similar comment with the value assignment - any consolidation will make it more readable

When copying object containing python objects, deep copy will
copy the indices and data but will not do so recursively.

>>> class Point(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required to put a class in this docstring? Again for the sake of being concise wonder if you can just use lists to illustrate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I made this change plus the other requested changes.

@thismakessand
Copy link
Contributor Author

I removed the wording/examples about the index not being copied when deep=False because while it's not copied when .copy() is called, if the index is changed on the shallow copy it does not affect the index of the original (in contrast with the data values, which are also not copied, but changes to values do affect the original).

When `deep=False`, a new object will be created without copying
the calling object's data (only a reference to the data is
copied). Any changes to the data of the original will be reflected
in the shallow copy (and vice versa).

Parameters
----------
deep : boolean or string, default True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`True` instead of just True

When `deep=False`, a new object will be created without copying
the calling object's data (only a reference to the data is
copied). Any changes to the data of the original will be reflected
in the shallow copy (and vice versa).

Parameters
----------
deep : boolean or string, default True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str instead of string

will not be copied recursively, only the reference to the object.
This is in contrast to ``copy.deepcopy`` in the Standard Library,
which recursively copies object data.
With `deep=False` neither the indices or the data are copied.

Returns
-------
copy : type of caller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move "type of caller" down as the description and say something like "Object type matches caller."


>>> s = pd.Series([1,2], dtype="int32", index=["a", "b"])
>>> s = pd.Series([1,2], index=["a", "b"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after first comma


When copying object containing python objects, deep copy will
copy the indices and data but will not do so recursively.
When copying an object containing python objects, deep copy will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize Python and maybe say "a deep copy"

>>> deep_copy = s.copy()
>>> shallow_copy = s.copy(deep=False)
>>> id(s) == id(shallow_copy)
False
>>> id(s.index) == id(shallow_copy.index)
True
>>> id(s.values) == id(shallow_copy.values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just again wondering if these can be more concise. Perhaps choosing a different variable name will allow you to do id(s.values) == id(shlw.values) and id(s.index) == id(shlw.values)

When `deep=True`, data is copied but actual python objects
will not be copied recursively, only the reference to the object.
This is in contrast to `copy.deepcopy` in the Standard Library,
which recursively copies object data. (See examples below).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...copies object data (see examples below)." - reduces some punctuation and saves space

will not be copied recursively, only the reference to the object.
This is in contrast to ``copy.deepcopy`` in the Standard Library,
which recursively copies object data.
With `deep=False` neither the indices or the data are copied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"neither...nor" instead of "or"

When `deep=True` (default), a new object will be created with a
copy of the calling object's data and indices. Modifications to
the data or indices of the copy will not be reflected in the
original object (see notes below).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you want to have a Notes section I would say "see Examples below" as a reference to the section actually showing that

>>> deep_copy
a 1
b 2
dtype: int32
dtype: int64
>>> shallow_copy[0] = 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having two different assignments I think you can have s[0] = 3 and then this line both before printing any of the objects. Gets you the same result and covers the same concepts in less lines

@WillAyd
Copy link
Member

WillAyd commented Mar 11, 2018

@thismakessand not sure I fully understand why you removed the mention of indices. Perhaps its related to the conversation in #19862? Referencing that I think it's preferable to use is for comparing the indices instead of using the equality operator. Perhaps that will help clarify how the copy is actually working behind the scenes.

Lot's of edits but they are mostly minor - thanks and keep up the great work!

copy of the calling object's data and indices. Modifications to
the data or indices of the copy will not be reflected in the
original object (see notes below).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indices are immutable, mention this

a 1
b 2
dtype: int64
>>> s_copy = s.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line between cases (if they are separate / distinct)

False
>>> id(s.values) == id(deep_copy.values)
False
>>> s[0] = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break this up with some text describing what you are showing

copy the data, but will not do so recursively.

>>> s = pd.Series([[1, 2], [3, 4]])
>>> s_copy = s.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show this with a deep copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how to address this comment - do you mean to show the behavior of
.copy(deep=True) versus the std python copy.deepcopy()? I was trying to show an example of how a (deep=True) copy isn't necessarily a deep copy in the standard python sense, but maybe I should remove this example entirely since it's already mentioned as a caveat in the Notes section?

1 [3, 4]
dtype: object

For deep-copying python objects, the following can be used:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this last example is not necessary

@thismakessand
Copy link
Contributor Author

Thanks - I added more clarification regarding the index and made the examples simpler and hopefully clearer.

will not be copied recursively, only the reference to the object.
This is in contrast to ``copy.deepcopy`` in the Standard Library,
which recursively copies object data.
With `deep=False` neither the indices nor the data are copied.

Returns
-------
copy : type of caller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to clean up the line that says copy : type of caller. I would explicitly list the objects that could be a caller of this function, which I assume would make this Series or DataFrame but if there's anything else be sure to add that.

Otherwise nice job on all the edits - changes I requested all lgtm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would make this "Series or DataFrame"

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice docstring!
Added few more comments


Notes
-----
When `deep=True`, data is copied but actual Python objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks around deep=True

will not be copied recursively, only the reference to the object.
This is in contrast to ``copy.deepcopy`` in the Standard Library,
which recursively copies object data.
With `deep=False` neither the indices nor the data are copied.

Returns
-------
copy : type of caller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would make this "Series or DataFrame"

the calling object's data or index (only references to the data
and index are copied). Any changes to the data of the original
will be reflected in the shallow copy (and vice versa). Changes
to the index will result in a new index as indices are immutable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line "Changes to the index will result in a new index as indices are immutable" is a bit out of place here, as this is not related to deep=False

For the fact that indices are immutable, I would add a short explanation about that in the Notes section (it is a bit too detailed for the extended summary at the beginning of the docstring).
The main thing is that, even for deep=True, the index data (the underlying numpy array) is not copied, only the object (the Index class holding the array data). This is done for performance reasons: because Index is immutable, it is not needed to copy the data.
(see #19862 for discussion on it)

@codecov
Copy link

codecov bot commented Mar 12, 2018

Codecov Report

Merging #20261 into master will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20261      +/-   ##
==========================================
+ Coverage   91.77%   91.84%   +0.06%     
==========================================
  Files         152      152              
  Lines       49205    49231      +26     
==========================================
+ Hits        45159    45215      +56     
+ Misses       4046     4016      -30
Flag Coverage Δ
#multiple 90.22% <100%> (+0.06%) ⬆️
#single 41.83% <33.33%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 95.85% <100%> (ø) ⬆️
pandas/core/arrays/categorical.py 96.2% <0%> (-0.02%) ⬇️
pandas/core/window.py 96.26% <0%> (-0.01%) ⬇️
pandas/core/indexes/datetimelike.py 96.72% <0%> (ø) ⬆️
pandas/core/panel.py 97.29% <0%> (ø) ⬆️
pandas/core/base.py 96.78% <0%> (ø) ⬆️
pandas/core/indexes/category.py 97.3% <0%> (ø) ⬆️
pandas/core/series.py 93.84% <0%> (ø) ⬆️
pandas/core/frame.py 97.18% <0%> (ø) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7273ea0...18c7be3. Read the comment docs.

@jorisvandenbossche
Copy link
Member

@thismakessand Do you have time to update this PR based on the remaining feedback ?

@thismakessand
Copy link
Contributor Author

@jorisvandenbossche Yes, it's definitely still on my radar, hoping to find some time later today or tomorrow.

@thismakessand
Copy link
Contributor Author

I read through the linked issues and added the details you mentioned. Looking at the linked issues and other stuff on stack overflow, etc it seems like most of the confusion is around

  1. deep copy not being a "true" deep copy when the series/dataframe contains python objects and
  2. the underlying data of the Index isn't copied even when deep=True.

I think both of these are covered now. Originally I had more details but I was worried going too into the caveats would just be unnecessarily confusing so I cut a bit out. I can add more if it's still unclear.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates are looking good!

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Mar 23, 2018
@jorisvandenbossche jorisvandenbossche merged commit 8004eff into pandas-dev:master Mar 23, 2018
@jorisvandenbossche
Copy link
Member

@thismakessand Thanks a lot!

@jorisvandenbossche
Copy link
Member

There is one remaining thing: the type description for deep also said 'str'. I removed this for now, as it is not explained, but apparently 'all' is accepted to also copy the Index underlying data.
But will open a new issue for that to see if we actually want to expose this publicly.

javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018
@thismakessand thismakessand deleted the doc-series-copy branch September 29, 2018 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants