Skip to content

BUG: groupby(..., dropna=False).indices with single group key does not include nan group #35646

Closed
@mroeschke

Description

@mroeschke
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

# Your code here
In [9]: data = {'group':['g1', 'g1', 'g1', np.nan, 'g1', 'g1', 'g2', 'g2', 'g2', 'g2', np.nan],
   ...:                     'A':[3, 1, 8, 2, 6, -1, 0, 13, -4, 0, 1],
   ...:                     'B':[5, 2, 3, 7, 11, -1, 4,-1, 1, 0, 2]}
   ...: df = pd.DataFrame(data)
   ...: df.groupby('group',dropna=True).indices
Out[9]: {'g1': array([0, 1, 2, 4, 5]), 'g2': array([6, 7, 8, 9])}

In [10]: df.groupby('group',dropna=False).indices
Out[10]: {'g1': array([0, 1, 2, 4, 5]), 'g2': array([6, 7, 8, 9])}

In [11]: pd.__version__
Out[11]: '1.2.0.dev0+67.gaefae55e1'

Problem description

The grouping codes + indices are determined for a single group by key here

values = Categorical(self.grouper)

And Categorical does not support nan as a label (only a missing -1 code)

This works correctly if multiple group keys are passed

Once this issue is addressed, #35542 will be fixed

Expected Output

In [10]: df.groupby('group',dropna=False).indices
Out[10]: {'g1': array([0, 1, 2, 4, 5]), 'g2': array([6, 7, 8, 9]), np.nan: array([3, 10]}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions