Skip to content

ENH: Ability to name columns/index levels when using .str.split(..., expand=True) on Index/Series #61515

Open
@nachomaiz

Description

@nachomaiz

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

When using .str.split(..., expand=True):

  • On a Series the resulting dataframe columns are labeled with numbers by default
  • On an Index the resulting levels are not labeled

It would be great if we could specify the names that the new columns or levels will take once the split is performed.

Feature Description

I think it would be helpful if the method had a names parameter that would at a minimum accept a sequence of labels for the newly created columns/levels, similarly to how MultiIndex is initialized.

It could work like so:

>>> index = pd.Index(["a_b"])
>>> index.str.split("_", expand=True, names=["A", "B"])
MultiIndex([('a', 'b')], names=["A", "B"], length=1)
>>> series = pd.Series(["a_b"])
>>> series.str.split("_", expand=True, names=["A", "B"])
|   | A | B |
|---|---|---|
| 0 | a | b |

The length of the names sequence should match the number of expanded columns/levels, otherwise it should throw a ValueError.

Alternative Solutions

For Index, this works almost exactly the same:

>>> index.str.split("_", expand=True).rename(["A", "B"])

So I think it's not as impactful for Index.

But for Series, this becomes more cumbersome, and the need to specify the renaming via a dictionary makes it feel disjointed vs the easier index renaming and MultiIndex instantiation:

>>> series.str.split("_", expand=True).rename(columns={0: "A", 1: "B"})

So my proposal would provide a similar interface for using the split method of the str accessor across pandas sequences.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions