Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Apply function on a column easily, maintaining fluent interface #38229

Closed
tdamsma opened this issue Dec 2, 2020 · 5 comments
Closed

ENH: Apply function on a column easily, maintaining fluent interface #38229

tdamsma opened this issue Dec 2, 2020 · 5 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@tdamsma
Copy link
Contributor

tdamsma commented Dec 2, 2020

I really like the pandas fluent / method chaining interface but it is not always convenient to use. I often end up writing code like so:

import pandas as pd
import datetime as dt

df = pd.DataFrame([["05SEP2014", "a"]], columns=["date", "other_col"])

def fix_date_a(df):
    df['date'] = df["date"].apply(lambda x: dt.datetime.strptime(x, "%d%b%Y"))
    return df

# of course this example could have been vectorized much better like so

def fix_date_b():
    df['date'] = pd.to_datetime(df['date'], format='%d%b%Y')
    return df

This is ok, but I would much rather write this in a fluent style. As far as I am aware this is only possible with .assign and .pipe, and would give something like so:

from functools import partial
def element_wise_date_parser(x):
    return dt.datetime.strptime(x, "%d%b%Y")
vectorized_date_parser = partial(pd.to_datetime, format='%d%b%Y')


def fix_date_c():
    return df.assign(date=lambda x: x["date"].apply(element_wise_date_parser))

def fix_date_d():
    return df.assign(date=lambda x: vectorized_date_parser(x["date"]))

I find that syntax not so easy to read and write. And as I find myself needing to perform an operation on a single column of a dataframe quite often, I would like to have a better way to do that. I propose add an apply_to method for the element wise which I made an example monkeypatch implementation:

def apply_to(self, column_name, function):
    return self.assign(**{column_name: lambda x: x[column_name].apply(function)})

pd.DataFrame.apply_to = apply_to

def fix_date_d(df):
    return df.apply_to("date", element_wise_date_parser)

# not sure what a column wise function could be named, lets say operate_on
def operate_on(self, column_name, function):
    return self.assign(**{column_name: lambda x: function(x[column_name])})

pd.DataFrame.operate_on= operate_on

def fix_date_e(df):
    return df.operate_on("date", vectorized_date_parser)

With these examples I find it much is easier to in-place modify a column without breaking the fluent interface. Of course adding even more methods to the already broad dataframe API is not free so am not 100% sure this is a good idea. But I wanted to put it up here anyway as often see myself and others cluttering code with unnecessary intermediate dataframes and or repeatedly reassigning df due to not knowing how to keep the fluent interface. And sprinkling around "assign with lambdas" everywhere is also not that appealing

@tdamsma tdamsma added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 2, 2020
@jreback
Copy link
Contributor

jreback commented Dec 2, 2020

-1 on yet another form of apply

the .assign interface is very flexible, concise, readable and works well

@rhshadrach
Copy link
Member

rhshadrach commented Dec 3, 2020

Have you considered:

def fix_date(ser):
    return pd.to_datetime(ser, format='%d%b%Y')

df["date"] = fix_date(df["date"])

@tdamsma
Copy link
Contributor Author

tdamsma commented Dec 3, 2020

Have you considered:

def fix_date(ser):
    return pd.to_datetime(ser, format='%d%b%Y')

df["date"] = fix_date(df["date"])

I meant the example as an illustration of how doing an operation on a single column of a dataframe breaks the method chaining, (unless one uses .assign and a lambda). I would like to be able to write e.g.

...
return (
    pd.DataFrame([["05SEP2014", 3]], columns=["date", "other_col"])
    .apply_to("date", element_wise_date_parser)
    .query("2 > other_col > 4")
    .to_json()
)

And not

...
df = pd.DataFrame([["05SEP2014", 3]], columns=["date", "other_col"])
df["date"] = fix_date(df["date"]
return df.query("2 > other_col > 4").to_json()

But I guess I'll have to settle for .assign and lambda as @jreback objects, like so:

...
return (
    pd.DataFrame([["05SEP2014", 3]], columns=["date", "other_col"])
   .assign(date=lambda x: x["date"].apply(element_wise_date_parser))
   .query("2 > other_col > 4")
   .to_json()
)

@jbrockmendel
Copy link
Member

similar to #40322

@mroeschke
Copy link
Member

Thanks for the suggestion, but it appears that there isn't much appetite for supporting this feature in pandas. Closing but happy to reopen if there is a changed view for this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

5 participants