-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: iloc setting columns not taking effect #33198
Comments
@MomIsBestFriend this is non-trivial, might be worth taking a shot at |
@jbrockmendel I Will definitely give it a shot, thank you for giving me things I might be able to work on! |
@jbrockmendel I have tried to solve this, but I am no where near close to do this, I guess this one is non-trivial 😞 But I did learn a lot about |
how about let's walk through the process [i would use] to address this? The line that isn't working as expected is |
This takes a slice of Edit: |
yes. more specifically, this is equivalent to
You can check that |
I have looked mainly at pandas/pandas/core/indexing.py Lines 657 to 666 in 70086c5
|
Tried to have things like: key = com.apply_if_callable(key, self.obj, dtype=self.obj.dtypes.dtype) |
(stop me if you dont want to engage in this exercise. i think it may be useful) thats the right place to look. So setting Looking at |
It's very useful!
I don't know, can you elaborate? |
So looking through _setitem_with_indexer, from 1549-1574 its not doing any setting, just figuring out take_split_path. What is take_split_path going to end up being? |
False |
@jbrockmendel removing the |
I think you're right on both counts. My best guess as to the underlying issue is that the Block checks if it can hold the values being set, finds that it can, and so sets them inplace. How should we proceed? |
I think I got it take_split_path = blk._can_hold_element(val) or self.obj._is_mixed_type seems to not break anything (passed
|
I'll be pretty surprised if that passes the full test suite, but give it a try |
@jbrockmendel I don't know, how to proceed from here. I am trying to learn how to fix bugs in general and what is a good way of thinking, when trying to solve bugs, how would you proceed from here? |
So in _setitem_with_indexer we found that take_split_path is False, then L1575-1632 leaves our item_labels gets defined on L1642, then since take_split_path is False all we have left to look at is L1781-L1816. What part of that chunk of code is causing the problem? |
AFAICT L1814-L1816, but I am really not sure about that. EDIT: Maybe L1979 that is returning before the set is happening on L1815? |
Until you get an intuition for this part of the code, trial and error.
Do we go through that path? I expected the condition on L1786-1795 wouldnt hold |
Thank you for teaching me, this is really not taken for granted.
For some reason I thought we do, I just debugged it, the |
OK, so we're back at L1814-1816. Spoiler: 1815 is the relevant line. It ends up doing From this we deduce that we'd be better off going down the So as you surmised early on, the _can_hold_element check is where we need to focus our attention. In #33381 you found that flipping that check breaks a bunch of tests, so the _can_hold_element check is correct in those cases. This leaves three possibilties to consider:
|
I want to say check for or maybe change L1815, to something else? @jbrockmendel As you can tell I am not sure about anything here. |
No worries, we were all in this position once.
That is the right answer, but for the sake of learning the methodology id like you to think of what the 3rd possibility to consider would be. |
@jbrockmendel I have looked at Am I spouse to say what method of @jbrockmendel I need a hint (another) if I'm in the right direction. |
The third possibility I was looking for is that this may not be possible; its conceivable that the current behavior is intentional, and there is no good way to get the desired behavior without breaking an existing test. So the next thing I would do is look at the failed tests in #33381 and ask: what is the difference between all of those cases and this case? |
@jbrockmendel AFICT the difference is with the multi-index and categorical-index. |
Let's look at the key/indexer being used in each case. Recall in this issue we're looking at Browsing through the cases in #33381 (lets just look at setting with
Since there are fewer useful examples in there than I expected, I'll make up some more and we'll pretend that they showed up in the logs:
So what is the difference between the case we care about and all of those cases? Why would it that in our case we want to get a new dtype (backed by a new ndarray), whereas in those cases we want to set values into the existing ndarray? |
Because in our example we look for a range of indexes? (RangeIndex) EDIT: or that we have a null slice? I feel like I am missing something very basic here. |
bingo! In this case we are saying "set these entire columns" while in the other cases we are saying "set a subset of rows in these columns". We're almost there. How to we change the behavior in the new cases without breaking the old cases? |
@jbrockmendel I have tried in the past hour some things, this is the solution that have the less amount of errors (from what I tried) # if there is only one block/type, still have to take split path
# unless the block is one-dimensional or it can hold the value
if not take_split_path and self.obj._mgr.blocks:
(blk,) = self.obj._mgr.blocks
if 1 < blk.ndim: # in case of dict, keys are indices
val = list(value.values()) if isinstance(value, dict) else value
if isinstance(indexer, tuple) and any(
com.is_null_slice(idx) for idx in indexer
):
take_split_path = blk._can_hold_element(val)
else:
take_split_path = not blk._can_hold_element(val) |
good! do all the tests pass with this? I haven't tried it yet, but my guess is going to be
So the main difference is check only for the first element of the tuple being a null slice, not any element. Can you tell me why? |
@jbrockmendel My guess is that the |
Important background: to a very rough approximation, we can think of a DataFrame as a dict of Series with the keys being So when we do |
I would expect 1000 float64 columns and 1000 float32 columns. I get the expected behavior if I set a single column before trying to set all 1000:
The text was updated successfully, but these errors were encountered: