Skip to content

str.extractall with no matches has incorrect index. #19034

Closed
@TomAugspurger

Description

@TomAugspurger

Code Sample, a copy-pastable example if possible

In [26]: s = pd.Series(['a', 'ab', 'abc'])

In [27]: s.str.extractall("(A)")
Out[27]:
Empty DataFrame
Columns: [0]
Index: []

In [28]: s.iloc[:0].str.extractall("(a)")
Out[28]:
Empty DataFrame
Columns: [0]
Index: []

Problem description

From the docs:

Its rows have a MultiIndex with first levels that come from
the subject Series. The last level is named 'match' and indicates
the order in the subject.

So I think the correct index should be something like

 In [51]: pd.MultiIndex([[], []], [[], []], names=[None, 'match'])
Out[51]:
MultiIndex(levels=[[], []],
           labels=[[], []],
           names=[None, 'match'])

Expected Output

In [52]: pd.Series(index=pd.MultiIndex([[], []], [[], []], names=[None, 'match']))
Out[52]: Series([], dtype: float64)

xref dask/dask#3038

cc @tdhock

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMultiIndexStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions