-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Enable using a boolean loc
in a non-boolean index
#52102
Comments
take |
@jbrockmendel thoughts here? I am not really sure if this is a good idea since normally Boolean elements are a mask |
I'd expect this to work. The desired behavior seems pretty unambiguous. |
If you provide a list of booleans it gets a bit sketchy? |
Yah. I think in principle we wouldn't allow a mask with |
So in the future, a list of bool elements could be considered as valid for Do I understand it right? |
Not allowing a mask would force chained indexing though? How would you filter the columns with a mask then? |
Good point. That's a tough one.
wouldn't iloc work? |
Fair but not ideal I guess for row selection |
Any news on this? |
I have implemented the feature and am working on fixing and creating the corresponding unit tests. As others have previously pointed out, there may be ambiguity when the input is a boolean array. One possible solution is to treat the boolean array as labels to search first. If there are no boolean elements in the index column, we can then check if its length matches that of the DataFrame object. If it does match, it will be treated as a mask, otherwise an error will be thrown. And I'll implement it if the maintainer agree with the solution or have a better idea. |
If we eventually get laziness in the filtering (xref #52980) then the chained indexing wouldn't be so much of a problem (though i guess it still would for setitem) |
Chained indexing and cow doesn’t work, so this would be tricky but could be worked around with laziness I guess |
Hi @Rylie-W, as you have mentioned in #53267, the following should work (though you haven't tried it due to installation issues). if isinstance(key, bool) and not (
ax.dtype.kind in ["b", "O"]
or isinstance(ax, MultiIndex) and ax._levels[0].dtype.kind in ["b", "O"]
):
raise KeyError(
f"{key}: boolean label can not be used without a boolean index"
) I have confirmed that with the change above, we can have the following (IMO is the desired behavior). >>> import pandas as pd
>>> pd.__version__
'2.1.0.dev0+796.gb2bb68a88c.dirty'
>>> df = pd.DataFrame(
... {"a": [0, 1, True, False, "Missing"], "b": [1, 2, 3, 4, 5]}
... ).set_index("a")
>>> df
b
a
0 1
1 2
True 3
False 4
Missing 5
>>> df.loc[1]
b
a
1 2
True 3
>>> df.loc[True]
b
a
1 2
True 3 Are you making a PR for this? Otherwise I can help as well. |
Yes, I'm working on that and adding some tests for it. Thanks for the confirmation! |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
When using
loc
to access a boolean value on a DataFrame which has an index that contains both booleans and non-booleans (so the dtype of the index is not bool) an error is raisedraises
KeyError: 'True: boolean label can not be used without a boolean index'
I am unsure why this check is performed in
_validate_key
, but this was an unexpected behavior for me, especially as this works with other combinations of types (int and string for example)Feature Description
I am not sure why this is currently not supported, so i am unsure if there is a need for a new api or we can use the current one for
.loc
Alternative Solutions
This is my current solution, but i don't believe this is a good one
Additional Context
No response
The text was updated successfully, but these errors were encountered: