API: Remove nan-likes from MultiIndex levels

Working on #27138 I've found that ``MultiIndex`` keeps nan-likes in the levels, but encode them all to -1:

```python
>>> levels, codes = [[nan, None, pd.NaT, 128, 2]], [[0, -1, 1, 2, 3, 4]]
>>> mi = pd.MultiIndex(levels, codes)
>>> mi.codes[0]
[-1, -1, -1, -1, 3, 4]
>>> mi.levels[0]
Index([nan, None, NaT, 128, 2], dtype='object')
```

All the MultiIndex nan-likes are encoded to -1, so it's not possible to decode them to their constituent values. So it's not possible to get more than one nan-like values out of the MultiIndex, so in this case ``None`` and ``NaT`` disappears when converting:

```python
>>> mi.to_frame()[0].array
<PandasArray>
[nan, nan, nan, nan, 128, 2]
Length: 6, dtype: object
```

I think if nan-likes are all encoded to -1, it'd be more consistent to not have them in the levels, similarly to how ``Categorical`` does it already.

```python
>>> c = pd.Categorical(levels[0])
>>> c.codes
array([-1, -1, -1,  1,  0], dtype=int8)
>>> c.categories
>>> Int64Index([2, 128], dtype='int64')
```

Is there acceptance to change the MultiIndex API so we get nan-likes out of the labels? That would give them an API more similar to ``Categorical``.

@pandas-dev/pandas-core.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: Remove nan-likes from MultiIndex levels #29111

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: Remove nan-likes from MultiIndex levels #29111

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions