Skip to content

BUG: different behaviors of sort_index() and sort_index(level=0) #13431

Closed
@zym1010

Description

@zym1010

Inspired by some bug reports around multiindex sortedness (http://stackoverflow.com/questions/31427466/ensuring-lexicographical-sort-in-pandas-multiindex, #10651, #9212), I found that sort_index() sometimes can't make a multiindex ready for slicing, but sort_index(level=0) (so does sortlevel()) can.

In [119]: pd.__version__
Out[119]: u'0.18.1'

In [120]: df = pd.DataFrame({'col1': ['b','d','b','a'], 'col2': [3,1,1,2], 'data':['one','two','three','four']})

In [121]: df2 = df.set_index(['col1','col2'])

In [122]: df2.index.set_levels(['b','d','a'], level='col1', inplace=True)

In [123]: df2.index.set_labels([0,1,0,2], level='col1', inplace=True)

In [124]: df2.sortlevel()
Out[124]:
            data
col1 col2
b    1     three
     3       one
d    1       two
a    2      four

In [125]: df2.sort_index()
Out[125]:
            data
col1 col2
a    2      four
b    1     three
     3       one
d    1       two

In [126]: df2.sort_index(level=0)
Out[126]:
            data
col1 col2
b    1     three
     3       one
d    1       two
a    2      four

While df2.sort_index() does give a visually lexicographically sorted output, it DOES NOT support slicing.

df2.sort_index().loc['b':'d']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)

# a lot of lines omitted here.

/Users/yimengzh/miniconda/envs/cafferc3/lib/python2.7/site-packages/pandas/indexes/multi.py in _partial_tup_index(self, tup, side)
   1488             raise KeyError('Key length (%d) was greater than MultiIndex'
   1489                            ' lexsort depth (%d)' %
-> 1490                            (len(tup), self.lexsort_depth))
   1491
   1492         n = len(tup)

KeyError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'

So I have two questions.

  1. Is this the intended behavior? I thought level=0 and level=None are synonyms to me, but they are not. Looking at the code, https://github.com/pydata/pandas/blob/4de83d25d751d8ca102867b2d46a5547c01d7248/pandas/core/frame.py#L3245-L3247 indeed there's a special processing when level is not None.
  2. What does "lexicographically sorted" mean? I think it should mean sorted in terms of levels, not labels. Is this what "lexicographically sorted" means in the doc for Advanced Indexing? If this is true, then I think make_index(level=0) is correct, yet make_index() is not.

Thanks.

output of pd.show_versions()

In [130]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions