Skip to content

API/ENH: Add mutate like method to DataFrames #9229

Closed
@TomAugspurger

Description

@TomAugspurger

In my notebook comparing dplyr and pandas, I gained a new level of appreciation for the ability to chain strings of operations together. In my own code, the biggest impediment to this is adding additional columns that are calculations on existing columns. For example

# R / dplyr
mutate(flights,
   gain = arr_delay - dep_delay,
   speed = distance / air_time * 60)

# ... calculation involving these

vs.

flights['gain'] = flights.arr_delay - flights.dep_delay
flights['speed'] = flights.distance / flights.air_time * 60

# ... calculation involving these later

just doesn't flow as nicely, especially if this mutate is in the middle of a chain.

I'd propose a new method (perhaps stealing mutate) that's similar to dplyr's.
The function signature could be kwarg only, where the keywords are the new column names. e.g.

flights.mutate(gain=flights.arr_delay - flights.dep_delay

This would return a DataFrame with the new column gain in addition to the original columns.

Worked out example

import pandas as pd
import seaborn as sns

iris = sns.load_dataset('iris')

(iris.query('sepal_length > 4.5')
     .mutate(ratio=iris.sepal_length / iris.sepal_width)  # new part
     .groupby(pd.cut(iris.ratio)).mean()
)

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions