Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: assert_frame_equal() with check_like=True errors with non-comparable types #39168

Closed
2 of 3 tasks
khaeru opened this issue Jan 14, 2021 · 3 comments · Fixed by #39204
Closed
2 of 3 tasks

BUG: assert_frame_equal() with check_like=True errors with non-comparable types #39168

khaeru opened this issue Jan 14, 2021 · 3 comments · Fixed by #39204
Labels
Regression Functionality that used to work in a prior pandas version Testing pandas testing functions or related to the test suite
Milestone

Comments

@khaeru
Copy link

khaeru commented Jan 14, 2021

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code sample

import pandas as pd
import pandas.testing as pdt

# Note that df.columns contains both str and int
df = pd.DataFrame([[0, 1, 2]], columns=["foo", "bar", 42])

pdt.asset_frame_equal(df, df, check_like=True)

Problem description

This code raises:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-05cc1ba40d40> in <module>
----> 1 pdt.assert_frame_equal(df, df, check_like=True)

    [... skipping hidden 2 frame]

~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in sort_values(self, return_indexer, ascending, na_position, key)
   4664         # ignore na_position for MultiIndex
   4665         if not isinstance(self, ABCMultiIndex):
-> 4666             _as = nargsort(
   4667                 items=idx, ascending=ascending, na_position=na_position, key=key
   4668             )

~/.local/lib/python3.8/site-packages/pandas/core/sorting.py in nargsort(items, kind, ascending, na_position, key, mask)
    365
    366     if is_extension_array_dtype(items):
--> 367         return items.argsort(ascending=ascending, kind=kind, na_position=na_position)
    368     else:
    369         items = np.asanyarray(items)

~/.local/lib/python3.8/site-packages/pandas/core/arrays/base.py in argsort(self, ascending, kind, na_position, *args, **kwargs)
    584
    585         values = self._values_for_argsort()
--> 586         return nargsort(
    587             values,
    588             kind=kind,

~/.local/lib/python3.8/site-packages/pandas/core/sorting.py in nargsort(items, kind, ascending, na_position, key, mask)
    377         non_nans = non_nans[::-1]
    378         non_nan_idx = non_nan_idx[::-1]
--> 379     indexer = non_nan_idx[non_nans.argsort(kind=kind)]
    380     if not ascending:
    381         indexer = indexer[::-1]

TypeError: '<' not supported between instances of 'int' and 'str'

The cause is PR #37479, which added the following to assert_index_equal():

    # If order doesn't matter then sort the index entries
    if not check_order:
        left = left.sort_values()
        right = right.sort_values()

This is code is triggered by assert_frame_equal(…, check_like=True). .sort_order() does not work when an index contains non-comparable types, like str and int.

Detected via iiasa/ixmp#390.

Expected output

In pandas < 1.2.0, the last line above returned True.

The description of the check_like argument is:

check_like : bool, default False
If True, ignore the order of index & columns.
Note: index labels must match their respective rows
(same as in columns) - same labels must be with the same data.

…i.e. this does not indicate that the columns index may only contain comparable types, so the function should not raise an exception.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 3e89b4c
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-36-generic
Version : #40-Ubuntu SMP Tue Jan 5 21:54:35 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_CA.UTF-8
LOCALE : en_CA.UTF-8

pandas : 1.2.0
numpy : 1.19.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.3.3
setuptools : 50.3.2
Cython : 0.29.21
pytest : 6.1.2
hypothesis : None
sphinx : 3.3.0
blosc : 1.8.1
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.6.1
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2

@khaeru khaeru added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 14, 2021
@jreback
Copy link
Contributor

jreback commented Jan 14, 2021

try on master

i think we patched this

@khaeru
Copy link
Author

khaeru commented Jan 14, 2021

The code is still there:

# If order doesn't matter then sort the index entries
if not check_order:
left = left.sort_values()
right = right.sort_values()

but I'll see if I can manage to install from master.

@phofl phofl added Regression Functionality that used to work in a prior pandas version Testing pandas testing functions or related to the test suite and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 16, 2021
@phofl phofl added this to the 1.2.1 milestone Jan 16, 2021
@khaeru
Copy link
Author

khaeru commented Jan 17, 2021

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Testing pandas testing functions or related to the test suite
Projects
None yet
3 participants