Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
ivl1 = pd.IntervalIndex.from_breaks([0.0, 1.0, 2.0])
ivl2 = pd.IntervalIndex.from_breaks([0.5, 1.5, 2.5])
mi1 = pd.MultiIndex.from_product([ivl1, ivl1])
mi2 = pd.MultiIndex.from_product([ivl2, ivl2])
s1 = pd.Series(1, index=mi1)
s2 = pd.Series(2, index=mi2)
expected_idx = pd.MultiIndex.from_tuples(
[
(pd.Interval(0.0, 1.0), pd.Interval(0.0, 1.0)),
(pd.Interval(0.0, 1.0), pd.Interval(1.0, 2.0)),
(pd.Interval(1.0, 2.0), pd.Interval(0.0, 1.0)),
(pd.Interval(1.0, 2.0), pd.Interval(1.0, 2.0)),
(pd.Interval(0.5, 1.5), pd.Interval(0.5, 1.5)),
(pd.Interval(0.5, 1.5), pd.Interval(1.5, 2.5)),
(pd.Interval(1.5, 2.5), pd.Interval(0.5, 1.5)),
(pd.Interval(1.5, 2.5), pd.Interval(1.5, 2.5))
]
)
expected = pd.Series([1, 1, 1, 1, 2, 2, 2, 2], index=expected_idx)
result = pd.concat([s1, s2])
pd.testing.assert_series_equal(result, expected)
Issue Description
The code crashes with
Traceback (most recent call last):
File "/tmp/concattest.py", line 26, in <module>
result = pd.concat([s1, s2])
File "/home/jmu3si/Devel/pandas/pandas/core/reshape/concat.py", line 393, in concat
return op.get_result()
File "/home/jmu3si/Devel/pandas/pandas/core/reshape/concat.py", line 640, in get_result
new_index = self.new_axes[0]
File "pandas/_libs/properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
val = self.fget(obj)
File "/home/jmu3si/Devel/pandas/pandas/core/reshape/concat.py", line 698, in new_axes
return [
File "/home/jmu3si/Devel/pandas/pandas/core/reshape/concat.py", line 699, in <listcomp>
self._get_concat_axis if i == self.bm_axis else self._get_comb_axis(i)
File "pandas/_libs/properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
val = self.fget(obj)
File "/home/jmu3si/Devel/pandas/pandas/core/reshape/concat.py", line 756, in _get_concat_axis
concat_axis = _concat_indexes(indexes)
File "/home/jmu3si/Devel/pandas/pandas/core/reshape/concat.py", line 774, in _concat_indexes
return indexes[0].append(indexes[1:])
File "/home/jmu3si/Devel/pandas/pandas/core/indexes/multi.py", line 2184, in append
level_codes = [
File "/home/jmu3si/Devel/pandas/pandas/core/indexes/multi.py", line 2185, in <listcomp>
recode_for_categories(
File "/home/jmu3si/Devel/pandas/pandas/core/arrays/categorical.py", line 2951, in recode_for_categories
new_categories.get_indexer(old_categories), new_categories
File "/home/jmu3si/Devel/pandas/pandas/core/indexes/base.py", line 3845, in get_indexer
raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: cannot handle overlapping indices; use IntervalIndex.get_indexer_non_unique
This used to work in 2.0.3. After bisecting it turns out that the performance optimization of f989e1b is breaking it. @lukemanley: any ideas how to fix this reasonably?
Expected Behavior
The code should finish without error.
Installed Versions
pandas : 2.2.0dev0+155.gc7325d7e7e
numpy : 1.24.4
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2.1
Cython : 0.29.33
pytest : 7.4.0
hypothesis : 6.83.0
sphinx : 6.2.1
blosc : 1.11.1
feather : None
xlsxwriter : 3.1.2
lxml.etree : 4.9.3
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.7
jinja2 : 3.1.2
IPython : 8.15.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : 1.3.7
dataframe-api-compat: None
fastparquet : 2023.8.0
fsspec : 2023.6.0
gcsfs : 2023.6.0
matplotlib : 3.7.2
numba : 0.57.1
numexpr : 2.8.5
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 13.0.0
pyreadstat : 1.2.3
pyxlsb : 1.0.10
s3fs : 2023.6.0
scipy : 1.11.2
sqlalchemy : 2.0.20
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.8.0
xlrd : 2.0.1
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None