Description
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
import pandas.testing as pdt
s1 = pd.Series([np.nan, np.nan, 'text'])
s2 = pd.Series([np.float64(np.nan), np.float64(np.nan),'text'])
# This doesn't blow up, thinks s1 and s2 are the same
pdt.assert_series_equal(s1, s2)
s1_unique = s1.drop_duplicates()
s2_unique = s2.drop_duplicates()
# This blows up
pdt.assert_series_equal(s1_unique, s2_unique)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-415-6908d74400cd> in <module>()
----> 1 pdt.assert_series_equal(s1_unique, s2_unique)
~/local/ts3/lib/python3.6/site-packages/pandas/util/testing.py in assert_series_equal(left, right, check_dtype, check_index_type, check_series_type, check_less_precise, check_names, check_exact, check_datetimelike_compat, check_categorical, obj)
1276 raise_assert_detail(obj, 'Series length are different',
1277 '{0}, {1}'.format(len(left), left.index),
-> 1278 '{0}, {1}'.format(len(right), right.index))
1279
1280 # index comparison
~/local/ts3/lib/python3.6/site-packages/pandas/util/testing.py in raise_assert_detail(obj, message, left, right, diff)
1147 msg = msg + "\n[diff]: {diff}".format(diff=diff)
1148
-> 1149 raise AssertionError(msg)
1150
1151
AssertionError: Series are different
Series length are different
[left]: 2, Int64Index([0, 2], dtype='int64')
[right]: 3, Int64Index([0, 1, 2], dtype='int64')
Problem description
When dealing with mixed dtype Series (sometimes as a result of .T followed by slice operation from dataframes), the drop_duplicates() call is very surprising, as it doesn't work for np.float64(np.nan). I would expect the htable.duplicated_object(values) call to also work with mixed dtypes containing np.float64 nan values.
The drop_duplicates() call does work for python's builtin float.nan, however.
Expected Output
import pandas as pd
import pandas.testing as pdt
s1 = pd.Series([np.nan, np.nan, 'text'])
s2 = pd.Series([np.float64(np.nan), np.float64(np.nan),'text'])
# This doesn't blow up, thinks s1 and s2 are the same
pdt.assert_series_equal(s1, s2)
s1_unique = s1.drop_duplicates()
s2_unique = s2.drop_duplicates()
# The following assertions should not blow up
assert len(s1_unique) == 2
assert len(s2_unique) == 2
pdt.assert_series_equal(s1_unique, s2_unique)
Output of pd.show_versions()
pandas: 0.20.2
pytest: 3.1.1
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: 0.9.5
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.4.0