Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Regression in assert_frame_equal when using check_like #39739

Closed
2 of 3 tasks
galipremsagar opened this issue Feb 11, 2021 · 10 comments · Fixed by #40872
Closed
2 of 3 tasks

BUG: Regression in assert_frame_equal when using check_like #39739

galipremsagar opened this issue Feb 11, 2021 · 10 comments · Fixed by #40872
Labels
Testing pandas testing functions or related to the test suite
Milestone

Comments

@galipremsagar
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

In 1.2.1:

>>> import pandas as pd
>>> one = pd.Index([], dtype='object')
>>> two = pd.RangeIndex(start=0, stop=0, step=1)
>>> df_one = pd.DataFrame(index=one)
>>> df_two = pd.DataFrame(index=two)
>>> pd.testing.assert_frame_equal(df_one, df_two, check_like=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nvme/0/pgali/envs/cudfdev/lib/python3.7/site-packages/pandas/_testing.py", line 1655, in assert_frame_equal
    obj=f"{obj}.index",
  File "/nvme/0/pgali/envs/cudfdev/lib/python3.7/site-packages/pandas/_testing.py", line 773, in assert_index_equal
    _check_types(left, right, obj=obj)
  File "/nvme/0/pgali/envs/cudfdev/lib/python3.7/site-packages/pandas/_testing.py", line 740, in _check_types
    assert_class_equal(left, right, exact=exact, obj=obj)
  File "/nvme/0/pgali/envs/cudfdev/lib/python3.7/site-packages/pandas/_testing.py", line 868, in assert_class_equal
    raise_assert_detail(obj, msg, repr_class(left), repr_class(right))
  File "/nvme/0/pgali/envs/cudfdev/lib/python3.7/site-packages/pandas/_testing.py", line 1073, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.index are different

DataFrame.index classes are not equivalent
[left]:  Index([], dtype='object')
[right]: RangeIndex(start=0, stop=0, step=1)

In 1.1.5:

>>> import pandas as pd
>>> one = pd.Index([], dtype='object')
>>> two = pd.RangeIndex(start=0, stop=0, step=1)
>>> df_one = pd.DataFrame(index=one)
>>> df_two = pd.DataFrame(index=two)
>>> pd.testing.assert_frame_equal(df_one, df_two, check_like=True)

Problem description

Upto 1.1.5, both df_one and df_two were considered to be same dataframe when check_like=True, but in latest pandas this is broken.

Expected Output

>>> pd.testing.assert_frame_equal(df_one, df_two, check_like=True)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 9d598a5
python : 3.7.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-76-generic
Version : #86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.1
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 49.6.0.post20210108
Cython : 0.29.21
pytest : 6.2.2
hypothesis : 6.1.1
sphinx : 3.4.3
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.20.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.8.5
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.1
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.52.0

@galipremsagar galipremsagar added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 11, 2021
@phofl
Copy link
Member

phofl commented Feb 13, 2021

This was caused by b867e21

@phofl phofl added Regression Functionality that used to work in a prior pandas version Testing pandas testing functions or related to the test suite and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 13, 2021
@phofl phofl added this to the 1.2.3 milestone Feb 13, 2021
@simonjayhawkins
Copy link
Member

This was caused by b867e21

#37479 cc @amilbourne

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Feb 14, 2021
@simonjayhawkins
Copy link
Member

can confirm

first bad commit: [b867e21] ENH: Fix output of assert_frame_equal if indexes differ and check_like=True (#37479)

@amilbourne
Copy link
Contributor

Apologies if I have broken something here.
I can see that the behaviour described above was changed by my PR, but I didn't think that ignoring index types was supposed to be part of what check_like=True did. The docs say:

If True, ignore the order of index & columns.
Note: index labels must match their respective rows
(same as in columns) - same labels must be with the same data.

I haven't actually checked this to confirm that it works yet, but if you want to ignore index type, could you use check_index_type=False?

@simonjayhawkins simonjayhawkins modified the milestones: 1.2.3, 1.2.4 Mar 2, 2021
@amilbourne
Copy link
Contributor

Sorry it has taken me so long to come back on this, but I have just verified that my suggestion above (check_index_type=False) does indeed fix the problem:
In 1.2.1:

>>> import pandas as pd
>>> one = pd.Index([], dtype='object')
>>> two = pd.RangeIndex(start=0, stop=0, step=1)
>>> df_one = pd.DataFrame(index=one)
>>> df_two = pd.DataFrame(index=two)
>>> pd.testing.assert_frame_equal(df_one, df_two, check_like=True, check_index_type=False)

Obviously others may differ, but it seems to me that the current behaviour in 1.2.1 is correct and the previous behaviour in 1.1.5 was accidental. Therefore this is not a bug.

@rhshadrach
Copy link
Member

Thanks for digging into this @amilbourne, I agree with your assessment.

Do @galipremsagar @phofl @simonjayhawkins agree?

@phofl
Copy link
Member

phofl commented Apr 10, 2021

Agree

@simonjayhawkins
Copy link
Member

to ensure the behavior does not change again, we should have a test to catch this.

@phofl
Copy link
Member

phofl commented Apr 11, 2021

put something up

@phofl phofl modified the milestones: 1.2.4, 1.3 Apr 11, 2021
@phofl phofl removed the Regression Functionality that used to work in a prior pandas version label Apr 11, 2021
@amilbourne
Copy link
Contributor

Phew!
Glad I didn't break anything.
@phofl - Thanks for the test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants