Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
index1=pd.Index(["apple","mango","orange","pear"],dtype="string")
index2=pd.Index(["apple","mango","orange","pear"],dtype="category")
assert index1.equals(index2)==index2.equals(index1)
Traceback (most recent call last):
File "/home/pandas/Draft/rough1.py", line 63, in <module>
assert index1.equals(index2)==index2.equals(index1)
AssertionError
Issue Description
pandas.Index.equals method is giving different results for string and category dtypes based on order
assert index2.equals(index1) gives True whereas
assert index1.equals(index2) gives False which makes these operations non-commutative
The first statement returns True because index 2 is pandas.core.indexes.category.CategoricalIndex which calls the .equals method in the sub class that overrides the Base Index class and implements the correct logic however
The second statement returns False because it enters this block
pandas/pandas/core/indexes/base.py
Lines 5637 to 5643 in bdc79c1
which converts the the string and categorical indexes into an ExtensionArray subclass and calls .equals again on those inputs but now it calls .equals method from the ExtensionArray class which does not ignore data types.
I can make a pull request and supplement with a few tests to resolve this issue.
Expected Behavior
assert index1.equals(index2)==index2.equals(index1) should pass
assert index1.equals(index2) should pass
Installed Versions
commit : 0691c5c
python : 3.10.8
python-bits : 64
OS : Linux
OS-release : 5.15.153.1-microsoft-standard-WSL2
Version : #1 SMP Fri Mar 29 23:14:13 UTC 2024
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 1.26.4
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.2
Cython : 3.0.11
sphinx : 8.0.2
IPython : 8.27.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : 1.4.0
dataframe-api-compat : None
fastparquet : 2024.5.0
fsspec : 2024.9.0
html5lib : 1.1
hypothesis : 6.112.1
gcsfs : 2024.9.0post1
jinja2 : 3.1.4
lxml.etree : 5.3.0
matplotlib : 3.9.2
numba : 0.60.0
numexpr : 2.10.1
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : 2.9.9
pymysql : 1.4.6
pyarrow : 17.0.0
pyreadstat : 1.2.7
pytest : 8.3.3
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.9.0
scipy : 1.14.1
sqlalchemy : 2.0.35
tables : 3.10.1
tabulate : 0.9.0
xarray : 2024.9.0
xlrd : 2.0.1
xlsxwriter : 3.2.0
zstandard : 0.23.0
tzdata : 2024.1
qtpy : None
pyqt5 : None