Skip to content

Label-based indexing on a Series with an index of dtype=object raises TypeError when using slices with integer bound #26491

Closed
@plammens

Description

@plammens

When indexing a pandas axis that has an index of dtype=object, with label-based indexing, passing a slice that contains an integer bound results in a TypeError, even if the integer is indeed a label in the index of the series.

Details

Code Sample

When indexing a pandas axis that has an index of dtype=object, with label-based indexing, passing a slice that contains an integer bound results in a TypeError, even if the integer is indeed a label in the index of the series.

import pandas as pd

series = pd.Series(range(4), index=[1, 'spam', 2, 'eggs'])
series
## 1       0
## spam    1
## 2       2
## eggs    3
## dtype: int64

series.index
## Index([1, 'spam', 2, 'eggs'], dtype='object')

series.loc['spam':'eggs']
## spam    1
## 2       2
## eggs    3
## dtype: int64

series.loc[1:'eggs']
# raises TypeError

Problem description

When indexing a pd.Series (or an equivalent pandas object) that has an index of dtype=object (upcasted for example from a list of ints and strs), with .loc's label-based indexing, passing a slice that contains an integer bound (as in 1:'eggs') results in a TypeError, even if the integer is indeed a label in the index of the series:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1504, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1871, in _getitem_axis
    return self._get_slice_axis(key, axis=axis)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1537, in _get_slice_axis
    slice_obj.step, kind=self.name)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4784, in slice_indexer
    kind=kind)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 5002, in slice_locs
    start_slice = self.get_slice_bound(start, 'left', kind)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4914, in get_slice_bound
    label = self._maybe_cast_slice_bound(label, side, kind)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4861, in _maybe_cast_slice_bound
    self._invalid_indexer('slice', label)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 3154, in _invalid_indexer
    kind=type(key)))
TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>

Other non-integer (and non-float) slice bounds are accepted, as shown above.

Is this the expected behaviour? Possibly it's a documentation issue. According to the section Selection by Label in the indexing doc page, the second warning says:

.loc is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a DatetimeIndex. These will raise a TypeError.

What exactly is meant by "compatible or convertible"? An int is definitely an object, so it should be compatible with the index type.

Furthermore, the first paragraph in the section reads (emphasis mine):

pandas provides a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol. Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the stop bound are included, if present in the index. Integers are valid labels, but they refer to the label and not the position.

And, finally, the subsection Slicing with labels says:

When using .loc with slices, if both the start and the stop labels are present in the index, then elements located between the two (including them) are returned

By reading this, to me it would seem that using a slice such as 1:'eggs' or 1:2 in the above example with loc should be perfectly valid. The indexers are present in the index, and, as stated above, integers are valid labels.


The exception originates at the _maybe_cast_slice_bound check, which rejects all integers regardless of whether the Index contains any integers:

def _maybe_cast_slice_bound(self, label, side, kind):
assert kind in ['ix', 'loc', 'getitem', None]
# We are a plain index here (sub-class override this method if they
# wish to have special treatment for floats/ints, e.g. Float64Index and
# datetimelike Indexes
# reject them
if is_float(label):
if not (kind in ['ix'] and (self.holds_integer() or
self.is_floating())):
self._invalid_indexer('slice', label)
# we are trying to find integer bounds on a non-integer based index
# this is rejected (generally .loc gets you here)
elif is_integer(label):
self._invalid_indexer('slice', label)
return label

Expected Output

I expected series.loc[1:'eggs'] not to raise TypeError because of the integer label. I expected that expression to return the following slice view of the Series:

1       0
spam    1
2       2
eggs    3
dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: 6d2398a58fda68e40f116f199439504558c7774c
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.25.0.dev0+598.g6d2398a58
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.3.0
pyarrow: 0.13.0
xarray: 0.12.1
IPython: 7.5.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: 1.8.1
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.1.0
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: 0.2.1
fastparquet: 0.3.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions