Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
When using .str.split(..., expand=True)
:
- On a
Series
the resulting dataframe columns are labeled with numbers by default - On an
Index
the resulting levels are not labeled
It would be great if we could specify the names that the new columns or levels will take once the split is performed.
Feature Description
I think it would be helpful if the method had a names
parameter that would at a minimum accept a sequence of labels for the newly created columns/levels, similarly to how MultiIndex
is initialized.
It could work like so:
>>> index = pd.Index(["a_b"])
>>> index.str.split("_", expand=True, names=["A", "B"])
MultiIndex([('a', 'b')], names=["A", "B"], length=1)
>>> series = pd.Series(["a_b"])
>>> series.str.split("_", expand=True, names=["A", "B"])
| | A | B |
|---|---|---|
| 0 | a | b |
The length of the names
sequence should match the number of expanded columns/levels, otherwise it should throw a ValueError
.
Alternative Solutions
For Index
, this works almost exactly the same:
>>> index.str.split("_", expand=True).rename(["A", "B"])
So I think it's not as impactful for Index
.
But for Series
, this becomes more cumbersome, and the need to specify the renaming via a dictionary makes it feel disjointed vs the easier index renaming and MultiIndex
instantiation:
>>> series.str.split("_", expand=True).rename(columns={0: "A", 1: "B"})
So my proposal would provide a similar interface for using the split
method of the str
accessor across pandas sequences.
Additional Context
No response