Skip to content

Inconsistent output in GroupBy.apply returning a DataFrame #34809

Closed
@TomAugspurger

Description

@TomAugspurger

This is a continuation of #13056, #14927, and #13056, which were closed by #31613. I think that PR ensured that we consistently take one of two code paths. This issue is to verify that we actually want the behavior on master.

Focusing on a specific pair of examples that differ only in whether the returned index is the same or not:

# master

In [10]: def f(x):
    ...:     return x.copy()  # same index


In [11]: def g(x):
    ...:     return x.copy().rename(lambda x: x + 1)  # different index

In [12]: df = pd.DataFrame({"A": ['a', 'b'], "B": [1, 2]})

In [13]: df.groupby("A").apply(f)
Out[13]:
   A  B
0  a  1
1  b  2

In [14]: df.groupby("A").apply(g)
Out[14]:
     A  B
A
a 1  a  1
b 2  b  2
# 1.0.4
In [8]: df.groupby("A").apply(f)
Out[8]:
     A  B
A
a 0  a  1
b 1  b  2

In [9]: df.groupby("A").apply(g)
Out[9]:
     A  B
A
a 1  a  1
b 2  b  2

So the 1.0.4 behavior is to always prepend the group keys to the result as an index level.
In pandas 1.1.0, whether the group keys are prepended depends on whether the udf returns a dataframe with an identical index. Do we want that kind of value-dependent behavior?

@jorisvandenbossche's notebook from #13056 (comment) might be helpful, though it might be out of date.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugGroupbyNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions