Open
Description
MultiIndex
normally use -1
entries in .codes
, which is correctly checked by isna
, but .groupby(..., dropna=False)
adds NaN
to the end of .levels
and uses their code instead.
Minimal demonstration:
>>> from numpy import nan
>>> from pandas import Series, MultiIndex
>>> s = (
... Series(
... 3,
... MultiIndex.from_tuples(
... [(1, nan), (1, nan), (1, 2)], names=["a", "b"]
... ),
... )
... .groupby(["a", "b"], dropna=False)
... .sum()
... .index
... )
>>> s
MultiIndex([(1, 2.0),
(1, nan)],
names=['a', 'b'])
>>> s.levels
FrozenList([[1], [2.0, nan]])
>>> s.codes
FrozenList([[0, 0], [0, 1]])
while all the regular MultiIndex
constructors consolidate the NaN
values to -1
>>> s2 = MultiIndex(s.levels, s.codes)
>>> s2.codes
FrozenList([[0, 0], [0, -1]])
It looks like this leads to all sorts of subtle bugs in pandas itself: pandas-dev/pandas#29111 , pandas-dev/pandas#36060 , pandas-dev/pandas#30750 ,
pandas-dev/pandas#43814 .