Closed
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
# Your code here
In [9]: data = {'group':['g1', 'g1', 'g1', np.nan, 'g1', 'g1', 'g2', 'g2', 'g2', 'g2', np.nan],
...: 'A':[3, 1, 8, 2, 6, -1, 0, 13, -4, 0, 1],
...: 'B':[5, 2, 3, 7, 11, -1, 4,-1, 1, 0, 2]}
...: df = pd.DataFrame(data)
...: df.groupby('group',dropna=True).indices
Out[9]: {'g1': array([0, 1, 2, 4, 5]), 'g2': array([6, 7, 8, 9])}
In [10]: df.groupby('group',dropna=False).indices
Out[10]: {'g1': array([0, 1, 2, 4, 5]), 'g2': array([6, 7, 8, 9])}
In [11]: pd.__version__
Out[11]: '1.2.0.dev0+67.gaefae55e1'
Problem description
The grouping codes + indices are determined for a single group by key here
pandas/pandas/core/groupby/grouper.py
Line 559 in aefae55
And Categorical
does not support nan
as a label (only a missing -1 code)
This works correctly if multiple group keys are passed
Once this issue is addressed, #35542 will be fixed
Expected Output
In [10]: df.groupby('group',dropna=False).indices
Out[10]: {'g1': array([0, 1, 2, 4, 5]), 'g2': array([6, 7, 8, 9]), np.nan: array([3, 10]}