Skip to content

isna does not work with explicit MultiIndex nan-representation #25

Open
@coroa

Description

@coroa

MultiIndex normally use -1 entries in .codes, which is correctly checked by isna, but .groupby(..., dropna=False) adds NaN to the end of .levels and uses their code instead.

Minimal demonstration:

>>> from numpy import nan
>>> from pandas import Series, MultiIndex
>>> s = (
...     Series(
...         3,
...         MultiIndex.from_tuples(
...             [(1, nan), (1, nan), (1, 2)], names=["a", "b"]
...         ),
...     )
...     .groupby(["a", "b"], dropna=False)
...     .sum()
...     .index
... )
>>> s
MultiIndex([(1, 2.0),
            (1, nan)],
           names=['a', 'b'])
>>> s.levels
FrozenList([[1], [2.0, nan]])
>>> s.codes
FrozenList([[0, 0], [0, 1]])

while all the regular MultiIndex constructors consolidate the NaN values to -1

>>> s2 = MultiIndex(s.levels, s.codes)
>>> s2.codes
FrozenList([[0, 0], [0, -1]])

It looks like this leads to all sorts of subtle bugs in pandas itself: pandas-dev/pandas#29111 , pandas-dev/pandas#36060 , pandas-dev/pandas#30750 ,
pandas-dev/pandas#43814 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions