Skip to content

API: Remove nan-likes from MultiIndex levels #29111

Open
@topper-123

Description

@topper-123

Working on #27138 I've found that MultiIndex keeps nan-likes in the levels, but encode them all to -1:

>>> levels, codes = [[nan, None, pd.NaT, 128, 2]], [[0, -1, 1, 2, 3, 4]]
>>> mi = pd.MultiIndex(levels, codes)
>>> mi.codes[0]
[-1, -1, -1, -1, 3, 4]
>>> mi.levels[0]
Index([nan, None, NaT, 128, 2], dtype='object')

All the MultiIndex nan-likes are encoded to -1, so it's not possible to decode them to their constituent values. So it's not possible to get more than one nan-like values out of the MultiIndex, so in this case None and NaT disappears when converting:

>>> mi.to_frame()[0].array
<PandasArray>
[nan, nan, nan, nan, 128, 2]
Length: 6, dtype: object

I think if nan-likes are all encoded to -1, it'd be more consistent to not have them in the levels, similarly to how Categorical does it already.

>>> c = pd.Categorical(levels[0])
>>> c.codes
array([-1, -1, -1,  1,  0], dtype=int8)
>>> c.categories
>>> Int64Index([2, 128], dtype='int64')

Is there acceptance to change the MultiIndex API so we get nan-likes out of the labels? That would give them an API more similar to Categorical.

@pandas-dev/pandas-core.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions