Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QST: Misleading SettingWithCopyWarning when assigning on result of DataFrame.dropna? #39448

Closed
2 tasks done
astrojuanlu opened this issue Jan 28, 2021 · 3 comments · Fixed by #56614
Closed
2 tasks done
Labels
Bug Copy / view semantics Warnings Warnings that appear or should be added to pandas

Comments

@astrojuanlu
Copy link

astrojuanlu commented Jan 28, 2021

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.


Under certain cases, doing an assignment on a DataFrame created using .dropna() emits the infamous SettingWithCopyWarning. So I followed what this SO answer suggests, and inspected the ._is_view and ._is_copy attributes:

In [1]: import pandas as pd                                                                                                                                                                                                                   

In [2]: import numpy as np                                                                                                                                                                                                                    

In [3]: df1 = pd.DataFrame([[1.0, np.nan, 2, np.nan], 
   ...:                    [2.0, 3.0, 5, np.nan], 
   ...:                    [np.nan, 4.0, 6, np.nan]], columns=list("ABCD")); df1                                                                                                                                                              
Out[3]: 
     A    B  C   D
0  1.0  NaN  2 NaN
1  2.0  3.0  5 NaN
2  NaN  4.0  6 NaN

In [4]: df2 = df1.dropna(axis="columns", how="all")   

In [5]: df2._is_view, df2._is_copy                                                                                                                                                                                                            
Out[5]: (False, <weakref at 0x7f56e8ecfd60; to 'DataFrame' at 0x7f56e916f9d0>)

In [6]: df2["A"] = df2["A"].fillna(df2["B"])                                                                                                                                                                                                  
<ipython-input-6-9c99ac454f90>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2["A"] = df2["A"].fillna(df2["B"])                                                                                                                                                                                        

However, this _is_copy is set to None if you actually call .copy() (misleading variable name?), which on the other hand removes the warning:

In [17]: df3 = df1.dropna(axis="columns", how="all").copy()                                                                                                                                                                                   

In [18]: df3._is_view, df3._is_copy                                                                                                                                                                                                           
Out[18]: (False, None)

In [19]: df3["A"] = df3["A"].fillna(df3["B"])                                                                                                                                                                                                 

In [20]: 

But, finally, .fillna result does have _is_copy set to None:

In [20]: df1 = pd.DataFrame([[1.0, np.nan], [2.0, 3.0]], columns=["A", "B"]); df1                                                                                                                                                             
Out[20]: 
     A    B
0  1.0  NaN
1  2.0  3.0

In [21]: df2 = df1.dropna(axis="columns", how="all")                                                                                                                                                                                          

In [22]: df2._is_view, df2._is_copy                                                                                                                                                                                                           
Out[22]: (False, None)

In [23]: df2["A"] = df2["A"].fillna(df2["B"])                                                                                                                                                                                                 

In [24]:  

Can anybody explain what's the meaning of the SettingWithCopyWarning in this context and why do I still get despite _is_view being False in all cases?

I am particularly worried about this, in light of what the pandas docs say:

Sometimes a SettingWithCopy warning will arise at times when there’s no obvious chained indexing going on. These are the bugs that SettingWithCopy is designed to catch!

Originally asked at https://stackoverflow.com/q/65926892/554319

@astrojuanlu astrojuanlu added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jan 28, 2021
@mroeschke mroeschke added Bug Warnings Warnings that appear or should be added to pandas and removed Usage Question Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 15, 2021
@tnwei
Copy link
Contributor

tnwei commented Dec 31, 2021

I posted a reply to the original post on stackoverflow. This issue does raise a good point however; let me summarize below so that it is easier for the devs to weigh in.

Based on the definition for dropna, if no rows are to be dropped, the original dataframe will be returned as a copy(). Else, the function will return a .loc of the original dataframe:

...
        if np.all(mask):
            result = self.copy()
        else:
            result = self.loc(axis=axis)[mask]
...

If rows were removed with dropna, any further assignment on the result will become chain assignment and raise SettingWithCopyWarnings. Is this intended behaviour?

@astrojuanlu astrojuanlu changed the title QST: Misleading SettingWithCopyWarning when assigning on result of DataFrame.dropna? QST: Misleading SettingWithCopyWarning when assigning on result of DataFrame.dropna? Dec 31, 2021
@rhshadrach
Copy link
Member

I don't believe so - we should not have value-dependent behavior determining whether a copy is returned. I'm uncertain if In this situation we show always make a copy, or only do so if _is_copy is not None.

@rhshadrach rhshadrach added this to the Contributions Welcome milestone Dec 31, 2021
@jreback
Copy link
Contributor

jreback commented Dec 31, 2021

this should probably copy

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Copy / view semantics Warnings Warnings that appear or should be added to pandas
Projects
None yet
6 participants