ENH: Apply function on a column easily, maintaining fluent interface

I really like the pandas fluent / method chaining interface but it is not always convenient to use. I often end up writing code like so:

```python
import pandas as pd
import datetime as dt

df = pd.DataFrame([["05SEP2014", "a"]], columns=["date", "other_col"])

def fix_date_a(df):
    df['date'] = df["date"].apply(lambda x: dt.datetime.strptime(x, "%d%b%Y"))
    return df

# of course this example could have been vectorized much better like so

def fix_date_b():
    df['date'] = pd.to_datetime(df['date'], format='%d%b%Y')
    return df

```

This is ok, but I would much rather write this in a fluent style. As far as I am aware this is only possible with `.assign` and `.pipe`, and would give something like so:

```python

from functools import partial
def element_wise_date_parser(x):
    return dt.datetime.strptime(x, "%d%b%Y")
vectorized_date_parser = partial(pd.to_datetime, format='%d%b%Y')


def fix_date_c():
    return df.assign(date=lambda x: x["date"].apply(element_wise_date_parser))

def fix_date_d():
    return df.assign(date=lambda x: vectorized_date_parser(x["date"]))
```

I find that syntax not so easy to read and write. And as I find myself needing to perform an operation on a single column of a dataframe quite often, I would like to have a better way to do that. I propose add an apply_to method for the element wise which I made an example monkeypatch implementation:

```python

def apply_to(self, column_name, function):
    return self.assign(**{column_name: lambda x: x[column_name].apply(function)})

pd.DataFrame.apply_to = apply_to

def fix_date_d(df):
    return df.apply_to("date", element_wise_date_parser)

# not sure what a column wise function could be named, lets say operate_on
def operate_on(self, column_name, function):
    return self.assign(**{column_name: lambda x: function(x[column_name])})

pd.DataFrame.operate_on= operate_on

def fix_date_e(df):
    return df.operate_on("date", vectorized_date_parser)

```
With these examples I find it much is easier to in-place modify a column without breaking the fluent interface. Of course adding even more methods to the already broad dataframe API is not free so am not 100% sure this is a good idea. But I wanted to put it up here anyway as often see myself and others cluttering code with unnecessary intermediate dataframes and or repeatedly reassigning `df` due to not knowing how to keep the fluent interface. And sprinkling around "assign with lambdas" everywhere is also not that appealing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Apply function on a column easily, maintaining fluent interface #38229

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Apply function on a column easily, maintaining fluent interface #38229

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions