Description
The 'level' argument in MultiIndex methods can be ambiguous in at least 2 ways.
-
If an index's level names are
[1, 0]
, thenlevel=0
may refer to either level depending on whether it is a position or a label API: unclear what integer level name references: name or position? #21677. -
If an index level names are
["foo", "bar", ("foo, "bar")]
, thenlevel=("foo", "bar")
may refer to either the first two levels, or the last level Index.droplevel when names are tuples #21120
One option to address the first one is adding keywords to specify e.g. level_num or level_name #10461
The new idea I'd like to discuss is having a pd.Label
class to treat something ambiguous as a label. So in the first case, passing level=pd.Label(0)
would indicate that you want the second level. In the second case, level=pd.Label(("foo", "bar"))
would indicate that this should be interpreted as a single label and not a sequence of labels.
Following an appropriate deprecation cycle, the default would be to interpret ambiguous cases as positional/sequence unless pd.Label is explicitly used.
pd.Label has a couple of other potential uses
A) If frame.index
(or series.index.levels[0]
) contains tuples, then df.loc[a, b]
may either be selecting a single key (a, b)
from the index, or may be selecting a
from the index and b
from the columns. df.loc[pd.Label((a, b))
would disambiguate
B) in __getitem__/__setitem__
pd.Label could indicate to use .loc and not fallback to .iloc (not that useful since the user could just use .loc directly)