Skip to content

DOC: DataFrameGroupBy.transform #42907

Open
@panda-byte

Description

@panda-byte

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html

Documentation problem

In my eyes, the documentation makes it very unclear how func (which is also mislabeled f in the parameter descriptions) is actually applied to each group.

General description:

Call function producing a like-indexed DataFrame on each group and return a DataFrame having the same indexes as the original object filled with the transformed values

Description of parameter func:

Function to apply to each group. [...]

I'm not sure if I'm just confused by this or if the documentation is actually misleading, because for me this is implying that transform takes a function which is called for each group and thus accepts a DataFrame as an argument.
This however is not the case, as transform actually applies func to each column within each group.
Only this phrase in the 'Notes' section (and the examples) somewhat hint at this functionality, in my opinion:

if this is a DataFrame, f must support application column-by-column in the subframe

(which I find kind of confusing as well, to be honest, as I expected that the "this" in this case would always be a DataFrame, because it is a method of 'groupby. DataFrame GroupBy')

On the other hand, the shorthand explanation in the 'See also' about transform section on other pages like GroupBy.apply is much more concise in my opinion:

transform: Apply function column-by-column to the GroupBy object.

Which to me, as an unexperienced pandas user, makes it crystal-clear what the function is supposed to do, as opposed to "Call function producing a like-indexed DataFrame on each group", which is rather ominous to me. It took me a while to figure out that GroupBy.apply is what I actually needed.

Suggested fix for documentation

Change the general description and func parameter description to include something along the lines of "Apply function column-by-column to the GroupBy object", which is already used as a short description as mentioned before, and fix the mislabeled parameter. And maybe the "if this is a DataFrame" phrase in the notes should be changed as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapDocsGroupby

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions