Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unary ~ operator broken for BooleanArray #32271

Closed
birnbera opened this issue Feb 26, 2020 · 1 comment
Closed

Unary ~ operator broken for BooleanArray #32271

birnbera opened this issue Feb 26, 2020 · 1 comment

Comments

@birnbera
Copy link

birnbera commented Feb 26, 2020

This may be related to https://github.com/pandas-dev/pandas/pull/31484.

Unary application of the ~ operator seems to operate bitwise on an underlying integer array rather than the expected negation of a boolean:

>>> s = Series([True, False], dtype="boolean")
>>> ~s
0    -2
1    -1
dtype: object
>>> s = Series([True, False], dtype="bool")
>>> ~s
0    False
1     True
dtype: bool

This behavior prevents the correct handling of boolean indexing:

>>> a = pd.Series(list("abcd"))
>>> s = pd.Series([True, False, True, False], dtype="boolean")
>>> a[s]
0    a
2    c
dtype: object
>>> a[~s]
...
KeyError: "None of [Int64Index([-2, -1, -2, -1], dtype='int64')] are in the [index]"

The problem goes away if both of your series are ExtensionArray types:

>>> a = pd.Series(list('abcd'), dtype="string")
>>> s = pd.Series([True, False, True, False], dtype="boolean")
>>> a[s]
0    a
2    c
dtype: object
>>> a[~s]
1    b
3    d
dtype: string

Edit

I updated to version 1.0.1 and the initial issue goes away (i.e. the unary ~ gives the expected result). However, the dtype reverts to object after assignment:

>>> s = pd.Series([True, False, pd.NA], dtype="boolean") # same behavior whether or not you include an `NA`
>>> s
0     True
1    False
2     <NA>
dtype: boolean
>>> ~s
0    False
1     True
2     <NA>
dtype: boolean
>>> s[3] = True
>>> s
0     True
1    False
2     <NA>
3     True
dtype: object
>>> ~s
0      -2
1      -1
2    <NA>
3      -2
dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200119
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : 0.15.0
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None

@jorisvandenbossche
Copy link
Member

The unary operator is indeed fixed in 1.0.1. For the assignment problem, I created a dedicated issue for that: #32346

Thanks for the report!

@jorisvandenbossche jorisvandenbossche added this to the No action milestone Feb 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants