Description
Proposal
I think it would be great if pandas.core.categorical.Categorical.rename_categories could take a dict-like argument as an alternative to the index-like argument that it currently requires. This would make the syntax more similar to that of DataFrame.rename. The proposed change would allow for something like this:
Code Sample
>>>> df['categorical_var']
0 apples
1 oranges
2 apples
3 apples
>>>> # rename categories from 'apples' and 'oranges' to 'red fruits' and 'orange fruits'
>>>> df['categorical_var'].rename_categories({ 'apples': 'red fruits', 'oranges': 'orange fruits'})
0 red fruits
1 orange fruits
2 red fruits
3 red fruits
Why?
1. Allows the developer to execute the rename without any prior knowledge of the order of the categories.
This:
>>>> df['categorical_var']
0 apples
1 oranges
2 apples
3 apples
>>>> df['categorical_var'].rename_categories({ 'apples': 'red fruits', 'oranges': 'orange fruits'})
0 red fruits
1 orange fruits
2 red fruits
3 red fruits
instead of this:
>>>> df['categorical_var']
0 apples
1 oranges
2 apples
3 apples
>>>> # to figure out current category order
>>>> print(df['categorical_var'].cat.categories)
['apples', 'oranges']
>>>> # rename based on order
>>>> df['categorical_var'].rename_categories(['red fruits', 'orange fruits'])
0 red fruits
1 orange fruits
2 red fruits
3 red fruits
2. Allows the developer to much more easily rename one or a small subset of categories without having to worry about the rest of the category names.
This:
>>>> df['categorical_var']
0 apples
1 oranges
2 apples
3 apples
4 green fruits
5 yellow fruits
6 blue fruits
7 purple fruits
>>>> df['categorical_var'].rename_categories({ 'apples': 'red fruits', 'oranges': 'orange fruits'})
0 red fruits
1 orange fruits
2 red fruits
3 red fruits
4 green fruits
5 yellow fruits
6 blue fruits
7 purple fruits
instead of this:
>>>> df['categorical_var']
0 apples
1 oranges
2 apples
3 apples
4 green fruits
5 yellow fruits
6 blue fruits
7 purple fruits
>>>> # to find full list of categories
>>>> print(df['categorical_var'].cat.categories)
['apples', 'oranges' 'green fruits', 'yellow fruits', 'blue fruits', 'purple fruits']
>>>> # rename must contain full category list
>>>> df['categorical_var'].rename_categories(['red fruits', 'orange fruits', 'green fruits', 'yellow fruits', 'blue fruits', 'purple fruits'])
0 red fruits
1 orange fruits
2 red fruits
3 red fruits
4 green fruits
5 yellow fruits
6 blue fruits
7 purple fruits
3. As I mentioned above, this also makes the syntax more similar to the DataFrame.rename syntax, which I think increases overall usability
I would be happy to start working on this if others agree that adding the feature makes sense.