Skip to content

With an external grouper, there is no way to access the grouped value in a DataFrame(...).groupby(...).apply(...) workflow #9545

Open
@brianthelion

Description

groupby-apply workflows are important pandas idioms. Here's a brief example grouping on a named DataFrame column:

>>> df = pd.DataFrame({'key': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'value': range(9)})
>>> result = df.groupby('key').apply(lambda x: x['key'])
>>> result
key   
1    0    1
     1    1
     2    1
2    3    2
     4    2
     5    2
3    6    3
     7    3
     8    3
Name: key, dtype: int64

An important highlight of this example is the ability to reference the grouped value -- eg, x['key'] -- inside the applied function.

pandas also supports grouping on arbitrary mapping functions, iterables, and lots of other objects. In these cases, the grouped value is not represented as a named column in the DataFrame. Thus, when using apply(...), there is no apparent way to access the group key value. The only alternative is to use a (slow) for-loop solution as in:

foo = lambda _k, _g: ...
grouped = df.groupby(grouper)
result_iter = (foo(key, group) for key, group in grouped) 
key_iter = (key for key, group in grouped)
pd.DataFrame.from_records(result_iter, index=key_iter)

IMHO, the ability to access the grouped value in an idiomatic way from within the applied function is ergonomically important; the groupby-apply idiom is at best partially realized without it.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions