Skip to content

Combination of groupby with dropna set to True and apply function not returning expected output #43205

Closed
@deponovo

Description

@deponovo

(Using pandas 1.3.1 and numpy 1.20.3)

This is a repetition of the SO question found here. (I did not have this issue previously when working with version 1.1.3).

In the following snippet:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5, 6, 7, 8, 9], 
        "b": [1, np.nan, 1, np.nan, 2, 1, 2, np.nan, 1]
    }
)
df_again = df.groupby("b", dropna=False).apply(lambda x: x)

I was expecting df and df_again to be identical. But they are not:

df
   a    b
0  1  1.0
1  2  NaN
2  3  1.0
3  4  NaN
4  5  2.0
5  6  1.0
6  7  2.0
7  8  NaN
8  9  1.0

df_again
   a    b
0  1  1.0
2  3  1.0
4  5  2.0
5  6  1.0
6  7  2.0
8  9  1.0

Now, if I tweak slightly the lambda expression slightly to "see" what is going on by df.groupby("b", dropna=False).apply(lambda x: print(x)) I can actually visualize that also the portion of the df where b was NaN was processed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions