-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame.loc multiple columns replace #30439
Comments
You're using label-based indexing using .loc. Pandas therefore does not infer that you want to replace column b with column d and column c with column e-- that would be positional logic. Either you can use .iloc for this use case, or else rename the columns before setting: In [6]: df.loc[df['a']=='a2', ['b','c']] = df[['d','e']].rename(columns={'d':'b', 'e':'c'})
In [7]: df
Out[7]:
a b c d e
0 a1 b1 c1 d1 e1
1 a2 d2 e2 d2 e2 Alternatively you can also use the setting using .to_numpy, see the warning box at https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#basics. In this case you need to do the alignment on the right hand side yourself: In [9]: df.loc[df['a']=='a2', ['b','c']] = df.loc[df['a'] == 'a2', ['d','e']].to_numpy()
In [10]: df
Out[10]:
a b c d e
0 a1 b1 c1 d1 e1
1 a2 d2 e2 d2 e2 |
I agree with @Liam3851 the weird thing is that |
@phofl can you elaborate on what behavior you expect? I think id expect |
I would have expected that
Why would you expect shape missmatch? |
well |
Ah I see. My understanding of |
Its entirely plausible that im wrong on this. If instead of |
In case of I quite like the feature, that you do not have to specify the same loc condition on the rhs but it gets filtered nevertheless. You could probably construct examples where this is a disadvantage instead of an advantage. But I think we should not break this code without warning, if we decide to do this. Don't know if this was intended with the implementation or only a side effect. |
@jbrockmendel Isn't the difference that in @phofl I think we need to differentiate: from The first one, you are assigning a Series to a DataFrame. This realigns to the df's index and then broadcasts across the DataFrame with all columns selected. For example this is totally fine and assigns columns a and b to the values of column e: In the second, you are assigning a DataFrame to a DataFrame. Because the columns are not compatible they get assigned as NaN. |
Yep you are right. I would expect
to work.
seems weird. |
I agree
looks a bit weird (you'd probably spell it the first way you gave above), but do you agree that
makes sense as a broadcast (broadcasting the Series across the DataFrame)? If so then
more or less has to be supported (or at least it would be weird not to support it). |
I still find it counter-intuitive that both |
Yep I would agree with counter-intuitive. Just thought this is a nice feature, if it is not a bug:) |
@jorisvandenbossche can you weigh on in what the intended behavior is here (no hurry) |
related to #10440 |
Python 3.6.8
pandas==0.25.3
The text was updated successfully, but these errors were encountered: