Skip to content

API/BUG: allow .str-accessor for 1-level MultiIndex? Return what? #23679

Closed
@h-vetinari

Description

@h-vetinari

Following from #23167...

The current checks for the .str-constructor regarding MultiIndex is (roughly)

if isinstance(data, MultiIndex) and data.nlevels > 1:
    raise ...

Meaning the constructor passes for MultiIndex with a single level, but essentially all methods fail or produce garbage:

idx = pd.Index(['aaa', 'bb', 'c'])
mi = pd.MultiIndex.from_arrays([idx])
>>> mi.str.len()
Int64Index([1, 1, 1], dtype='int64')  # compare idx.str.len() == Int64Index([3, 2, 1], dtype='int64')
>>> mi.str.cat()
[...]
NotImplementedError: initializing a Series from a MultiIndex is not supported
>>> mi.str.startswith('a')
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.upper()
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.islower()
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.split()
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.find('a')
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.ljust(10)
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.repeat(3)
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.slice(1, 2)
Index([(), (), ()], dtype='object')  # compare idx.str.slice(1, 2) == Index(['a', 'b', ''], dtype='object')
>>> mi.str.zfill(10)
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.wrap(2)
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.normalize('NFC')
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.index('')
[...]
ValueError: tuple.index(x): x not in tuple
>>> mi.str.get(1)
Float64Index([nan, nan, nan], dtype='float64')
>>> mi.str.contains('a')
Float64Index([nan, nan, nan], dtype='float64')

My original plan in #23167 was just to disable MultiIndex.str regardless of the number of levels, but @toobaz brought up the point (in a side discussion in #23670) that:

Sorry, naive question, but what is the problem with just running .str on the result of self.get_level_values(0)?

This would, of course, work without problem. The main question that arises from this issue:

  • should the .str-accessor be enabled for MultiIndex at all?
  • if yes, should it return an Index or a 1-level MultiIndex?

PS. As another link to #23670, one could maybe consider enabling .str for all MultiIndex, by operating on MultiIndex.to_flat_index() in those cases. This might be interesting for example for easy joining of the MultiIndex-levels with .str.join.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions