Skip to content

Series.isin fails (errors) for categoricals  #16639

Closed
@aviolov

Description

@aviolov

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
#%%
print(pd.__version__)
vals = np.array([0, 1,2, 0]);
cats = ['a', 'b', 'c'];

DFtrades = pd.DataFrame({'id': pd.Series(pd.Categorical(1).from_codes(vals, cats))});
DFscores = pd.DataFrame({'id': pd.Series(pd.Categorical(1).from_codes(np.array([0, 1]), cats))});

print(DFtrades)
print(DFscores)

select_ids = DFtrades['id'].isin(DFscores['id']);

Problem description

I get an error in 0.20.1

File "", line 12, in
select_ids = DFtrades['id'].isin(DFscores['id']);

File "C:\Users\alexandre\Anaconda3\lib\site-packages\pandas\core\series.py", line 2555, in isin
result = algorithms.isin(_values_from_object(self), values)

File "C:\Users\alexandre\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 421, in isin
return f(comps, values)

File "C:\Users\alexandre\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 399, in
f = lambda x, y: htable.ismember_object(x, values)

File "pandas_libs\hashtable_func_helper.pxi", line 428, in pandas._libs.hashtable.ismember_object (pandas_libs\hashtable.c:29677)

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

Expected Output

a boolean array (or series?) indicating the third row of DFtrades is not in DFscores but the other three are

for reference, this worked (I did not get an error) in 0.19.(something)

also this code will work as expected:

select_ids = DFtrades['id'].isin(DFscores['id'].values);

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.20.1
pytest: 3.1.1
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.19.0
xarray: 0.9.5
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions