Closed
Description
Inspired by some bug reports around multiindex sortedness (http://stackoverflow.com/questions/31427466/ensuring-lexicographical-sort-in-pandas-multiindex, #10651, #9212), I found that sort_index()
sometimes can't make a multiindex ready for slicing, but sort_index(level=0)
(so does sortlevel()
) can.
In [119]: pd.__version__
Out[119]: u'0.18.1'
In [120]: df = pd.DataFrame({'col1': ['b','d','b','a'], 'col2': [3,1,1,2], 'data':['one','two','three','four']})
In [121]: df2 = df.set_index(['col1','col2'])
In [122]: df2.index.set_levels(['b','d','a'], level='col1', inplace=True)
In [123]: df2.index.set_labels([0,1,0,2], level='col1', inplace=True)
In [124]: df2.sortlevel()
Out[124]:
data
col1 col2
b 1 three
3 one
d 1 two
a 2 four
In [125]: df2.sort_index()
Out[125]:
data
col1 col2
a 2 four
b 1 three
3 one
d 1 two
In [126]: df2.sort_index(level=0)
Out[126]:
data
col1 col2
b 1 three
3 one
d 1 two
a 2 four
While df2.sort_index()
does give a visually lexicographically sorted output, it DOES NOT support slicing.
df2.sort_index().loc['b':'d']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
# a lot of lines omitted here.
/Users/yimengzh/miniconda/envs/cafferc3/lib/python2.7/site-packages/pandas/indexes/multi.py in _partial_tup_index(self, tup, side)
1488 raise KeyError('Key length (%d) was greater than MultiIndex'
1489 ' lexsort depth (%d)' %
-> 1490 (len(tup), self.lexsort_depth))
1491
1492 n = len(tup)
KeyError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'
So I have two questions.
- Is this the intended behavior? I thought
level=0
andlevel=None
are synonyms to me, but they are not. Looking at the code, https://github.com/pydata/pandas/blob/4de83d25d751d8ca102867b2d46a5547c01d7248/pandas/core/frame.py#L3245-L3247 indeed there's a special processing whenlevel
is notNone
. - What does "lexicographically sorted" mean? I think it should mean sorted in terms of levels, not labels. Is this what "lexicographically sorted" means in the doc for Advanced Indexing? If this is true, then I think
make_index(level=0)
is correct, yetmake_index()
is not.
Thanks.
output of pd.show_versions()
In [130]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None