Description
When indexing a pandas axis that has an index
of dtype=object
, with label-based indexing, passing a slice that contains an integer bound results in a TypeError
, even if the integer is indeed a label in the index
of the series.
Details
Code Sample
When indexing a pandas axis that has an index
of dtype=object
, with label-based indexing, passing a slice that contains an integer bound results in a TypeError
, even if the integer is indeed a label in the index
of the series.
import pandas as pd
series = pd.Series(range(4), index=[1, 'spam', 2, 'eggs'])
series
## 1 0
## spam 1
## 2 2
## eggs 3
## dtype: int64
series.index
## Index([1, 'spam', 2, 'eggs'], dtype='object')
series.loc['spam':'eggs']
## spam 1
## 2 2
## eggs 3
## dtype: int64
series.loc[1:'eggs']
# raises TypeError
Problem description
When indexing a pd.Series
(or an equivalent pandas object) that has an index
of dtype=object
(upcasted for example from a list of int
s and str
s), with .loc
's label-based indexing, passing a slice that contains an integer bound (as in 1:'eggs'
) results in a TypeError
, even if the integer is indeed a label in the index
of the series:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1504, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1871, in _getitem_axis
return self._get_slice_axis(key, axis=axis)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1537, in _get_slice_axis
slice_obj.step, kind=self.name)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4784, in slice_indexer
kind=kind)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 5002, in slice_locs
start_slice = self.get_slice_bound(start, 'left', kind)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4914, in get_slice_bound
label = self._maybe_cast_slice_bound(label, side, kind)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4861, in _maybe_cast_slice_bound
self._invalid_indexer('slice', label)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 3154, in _invalid_indexer
kind=type(key)))
TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>
Other non-integer (and non-float) slice bounds are accepted, as shown above.
Is this the expected behaviour? Possibly it's a documentation issue. According to the section Selection by Label in the indexing doc page, the second warning says:
.loc is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a DatetimeIndex. These will raise a TypeError.
What exactly is meant by "compatible or convertible"? An int
is definitely an object
, so it should be compatible with the index type.
Furthermore, the first paragraph in the section reads (emphasis mine):
pandas provides a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol. Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the stop bound are included, if present in the index. Integers are valid labels, but they refer to the label and not the position.
And, finally, the subsection Slicing with labels says:
When using .loc with slices, if both the start and the stop labels are present in the index, then elements located between the two (including them) are returned
By reading this, to me it would seem that using a slice such as 1:'eggs'
or 1:2
in the above example with loc
should be perfectly valid. The indexers are present in the index, and, as stated above, integers are valid labels.
The exception originates at the _maybe_cast_slice_bound
check, which rejects all integers regardless of whether the Index
contains any integers:
pandas/pandas/core/indexes/base.py
Lines 4846 to 4863 in 6d2398a
Expected Output
I expected series.loc[1:'eggs']
not to raise TypeError
because of the integer label. I expected that expression to return the following slice view of the Series
:
1 0
spam 1
2 2
eggs 3
dtype: int64
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: 6d2398a58fda68e40f116f199439504558c7774c
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.25.0.dev0+598.g6d2398a58
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.3.0
pyarrow: 0.13.0
xarray: 0.12.1
IPython: 7.5.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: 1.8.1
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.1.0
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: 0.2.1
fastparquet: 0.3.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None