Description
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html
Documentation problem
In my eyes, the documentation makes it very unclear how func
(which is also mislabeled f
in the parameter descriptions) is actually applied to each group.
General description:
Call function producing a like-indexed DataFrame on each group and return a DataFrame having the same indexes as the original object filled with the transformed values
Description of parameter func
:
Function to apply to each group. [...]
I'm not sure if I'm just confused by this or if the documentation is actually misleading, because for me this is implying that transform
takes a function which is called for each group and thus accepts a DataFrame
as an argument.
This however is not the case, as transform
actually applies func
to each column within each group.
Only this phrase in the 'Notes' section (and the examples) somewhat hint at this functionality, in my opinion:
if this is a DataFrame, f must support application column-by-column in the subframe
(which I find kind of confusing as well, to be honest, as I expected that the "this" in this case would always be a DataFrame
, because it is a method of 'groupby. DataFrame GroupBy')
On the other hand, the shorthand explanation in the 'See also' about transform
section on other pages like GroupBy.apply is much more concise in my opinion:
transform: Apply function column-by-column to the GroupBy object.
Which to me, as an unexperienced pandas user, makes it crystal-clear what the function is supposed to do, as opposed to "Call function producing a like-indexed DataFrame on each group", which is rather ominous to me. It took me a while to figure out that GroupBy.apply is what I actually needed.
Suggested fix for documentation
Change the general description and func
parameter description to include something along the lines of "Apply function column-by-column to the GroupBy object", which is already used as a short description as mentioned before, and fix the mislabeled parameter. And maybe the "if this is a DataFrame" phrase in the notes should be changed as well.