Skip to content

Allow for dict-like argument to Categorical.rename_categories #17336

Closed
@linusmarco

Description

@linusmarco

Proposal

I think it would be great if pandas.core.categorical.Categorical.rename_categories could take a dict-like argument as an alternative to the index-like argument that it currently requires. This would make the syntax more similar to that of DataFrame.rename. The proposed change would allow for something like this:

Code Sample

>>>> df['categorical_var']
0    apples
1    oranges
2    apples
3    apples

>>>> # rename categories from 'apples' and 'oranges' to 'red fruits' and 'orange fruits'
>>>> df['categorical_var'].rename_categories({ 'apples': 'red fruits', 'oranges': 'orange fruits'})
0    red fruits
1    orange fruits
2    red fruits
3    red fruits

Why?

1. Allows the developer to execute the rename without any prior knowledge of the order of the categories.

This:

>>>> df['categorical_var']
0    apples
1    oranges
2    apples
3    apples

>>>> df['categorical_var'].rename_categories({ 'apples': 'red fruits', 'oranges': 'orange fruits'})
0    red fruits
1    orange fruits
2    red fruits
3    red fruits

instead of this:

>>>> df['categorical_var']
0    apples
1    oranges
2    apples
3    apples

>>>> # to figure out current category order
>>>> print(df['categorical_var'].cat.categories)
['apples', 'oranges']

>>>> # rename based on order
>>>> df['categorical_var'].rename_categories(['red fruits', 'orange fruits'])
0    red fruits
1    orange fruits
2    red fruits
3    red fruits

2. Allows the developer to much more easily rename one or a small subset of categories without having to worry about the rest of the category names.

This:

>>>> df['categorical_var']
0    apples
1    oranges
2    apples
3    apples
4    green fruits
5    yellow fruits
6    blue fruits
7    purple fruits

>>>> df['categorical_var'].rename_categories({ 'apples': 'red fruits', 'oranges': 'orange fruits'})
0    red fruits
1    orange fruits
2    red fruits
3    red fruits
4    green fruits
5    yellow fruits
6    blue fruits
7    purple fruits

instead of this:

>>>> df['categorical_var']
0    apples
1    oranges
2    apples
3    apples
4    green fruits
5    yellow fruits
6    blue fruits
7    purple fruits

>>>> # to find full list of categories
>>>> print(df['categorical_var'].cat.categories)
['apples', 'oranges' 'green fruits', 'yellow fruits', 'blue fruits', 'purple fruits']

>>>> # rename must contain full category list
>>>> df['categorical_var'].rename_categories(['red fruits', 'orange fruits', 'green fruits', 'yellow fruits', 'blue fruits', 'purple fruits'])
0    red fruits
1    orange fruits
2    red fruits
3    red fruits
4    green fruits
5    yellow fruits
6    blue fruits
7    purple fruits

3. As I mentioned above, this also makes the syntax more similar to the DataFrame.rename syntax, which I think increases overall usability

I would be happy to start working on this if others agree that adding the feature makes sense.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions