-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
There have been multiple issues regarding
- Confusion about relative respective roles of
aggandtransform(andapply).
Why do I get strange aggregation result from DataFrame groupby()? #26960, Behavior of new df.agg, df.transform and df.apply is very inconsistent #18103 in both the DataFrame and Grou transform('rank')and others returning the wrong answer (first issue from 2016): bug when filling missing values with transform? #14274, Shortcut functions in transform are not grouped #19354, Wrong output of GroupBy transform with string input (e.g., transform('rank')) #22509,
I'd like to prepare a PR to fix this, but I need to know what the consensus is first.
1. should Groupby/DataFrame/Series.agg disallow transformations?
#14741 (comment) about Groupby.agg
you need to use .transform. .agg is by definition a reducer.
agg currently accepts transformations as well.
DataFrame.agg was merged in #27389 despite repeated objections to mixing transformations and aggregations, #14668 (comment) and #14668 (comment).
2. What should transform('rank') return?
updated
g.min # one value per group. ok.
g.transform('min') # one value per group, broadcasted. ok.
g.rank() # like-index result. Ok
g.transform('rank') # Currently, a nonsense result.
On the one hand, users have been told that transformations don't belong in transform (!): #22509 (comment), #14274 (comment). This makes sense if you think of the transform('name') form as solely for broadcasting aggregations.
On the other hand, the documentation for transform, as well as Wes Mckinney's excellent pandas book portray transform as the dedicated tool for shape-preserving operations, so excluding them from the transform('name') case would be a little surprising.
Personally, I'm +0 for deprecating transform('rank') and with a warning to use g.rank(), as well as for the other transformation ops.