Skip to content

Add an option not to sort levels in MultiIndex.from_product? #14672

Open
@shoyer

Description

@shoyer

Currently, from_product always sorts levels in the resulting MultiIndex. This means that the result does not necessarily have lexsorted labels/codes.

PR #14062 adds an option to not sort levels when calling from_product. Compare:

In [4]: pd.MultiIndex.from_product([['a', 'b'], [2, 1, 0]], sort_levels=False)
Out[4]:
MultiIndex(levels=[['a', 'b'], [2, 1, 0]],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
           sortorder=0)

In [5]: pd.MultiIndex.from_product([['a', 'b'], [2, 1, 0]], sort_levels=True)
Out[5]:
MultiIndex(levels=[['a', 'b'], [0, 1, 2]],
           labels=[[0, 0, 0, 1, 1, 1], [2, 1, 0, 2, 1, 0]])

Using this option yields a few benefits:

  1. It's simpler -- resulting levels on the MultiIndex are exactly those you passed in.
  2. It's marginally faster -- you don't need to sort the levels.
  3. The resulting MultiIndex is always lex-sorted. This is handy if you want to be able to index it efficiently.

The downside is that the result can be a little less intuitive, because levels and labels do not have the same sort order (#14015).

I'm suggesting this option because it was useful for xarray (to fix pydata/xarray#980) and might also be relevant for other advanced users.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions