Skip to content

BUG: Index.equals is not commutative for string and category dtypes #60194

Closed
@DhruvBShetty

Description

@DhruvBShetty

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

index1=pd.Index(["apple","mango","orange","pear"],dtype="string")
index2=pd.Index(["apple","mango","orange","pear"],dtype="category")

assert index1.equals(index2)==index2.equals(index1)
Traceback (most recent call last):
  File "/home/pandas/Draft/rough1.py", line 63, in <module>
    assert index1.equals(index2)==index2.equals(index1)
AssertionError

Issue Description

pandas.Index.equals method is giving different results for string and category dtypes based on order

assert index2.equals(index1) gives True whereas
assert index1.equals(index2) gives False which makes these operations non-commutative

The first statement returns True because index 2 is pandas.core.indexes.category.CategoricalIndex which calls the .equals method in the sub class that overrides the Base Index class and implements the correct logic however

The second statement returns False because it enters this block

if isinstance(self._values, ExtensionArray):
# Dispatch to the ExtensionArray's .equals method.
if not isinstance(other, type(self)):
return False
earr = cast(ExtensionArray, self._data)
return earr.equals(other._data)

which converts the the string and categorical indexes into an ExtensionArray subclass and calls .equals again on those inputs but now it calls .equals method from the ExtensionArray class which does not ignore data types.

I can make a pull request and supplement with a few tests to resolve this issue.

Expected Behavior

assert index1.equals(index2)==index2.equals(index1) should pass
assert index1.equals(index2) should pass

Installed Versions

commit : 0691c5c
python : 3.10.8
python-bits : 64
OS : Linux
OS-release : 5.15.153.1-microsoft-standard-WSL2
Version : #1 SMP Fri Mar 29 23:14:13 UTC 2024
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.3
numpy : 1.26.4
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.2
Cython : 3.0.11
sphinx : 8.0.2
IPython : 8.27.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : 1.4.0
dataframe-api-compat : None
fastparquet : 2024.5.0
fsspec : 2024.9.0
html5lib : 1.1
hypothesis : 6.112.1
gcsfs : 2024.9.0post1
jinja2 : 3.1.4
lxml.etree : 5.3.0
matplotlib : 3.9.2
numba : 0.60.0
numexpr : 2.10.1
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : 2.9.9
pymysql : 1.4.6
pyarrow : 17.0.0
pyreadstat : 1.2.7
pytest : 8.3.3
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.9.0
scipy : 1.14.1
sqlalchemy : 2.0.35
tables : 3.10.1
tabulate : 0.9.0
xarray : 2024.9.0
xlrd : 2.0.1
xlsxwriter : 3.2.0
zstandard : 0.23.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugExtensionArrayExtending pandas with custom dtypes or arrays.Needs DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions