Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial Selection on MultiIndex: Control what index depth is returned. #3057

Closed
dragoljub opened this issue Mar 15, 2013 · 0 comments · Fixed by #6301
Closed

Partial Selection on MultiIndex: Control what index depth is returned. #3057

dragoljub opened this issue Mar 15, 2013 · 0 comments · Fixed by #6301
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@dragoljub
Copy link

Partial selection using .xs() & .ix[] on a subset of index levels returns a df with the fixed/selected levels dropped (a very nice feature). However, when you partially select using a tuple on all levels you get a data frame with all indices (levels) returned. It would be nice to have an option to return any/all indices when sub-selecting using subset of levels so there is consistency when you reach the lowest index level.

Perhaps there could be a "drop_fixed_index" parameter option when sub-selecting.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: print pd.__version__
0.11.0.dev-80945b6

In [4]: # Generate Test DataFrame
   ...: NUM_ROWS = 100000
   ...:

In [5]: NUM_COLS = 10

In [6]: col_names = ['A'+num for num in map(str,np.arange(NUM_COLS).tolist())]

In [7]: index_cols = col_names[:5]

In [8]: # Set DataFrame to have 5 level Hierarchical Index & Sort Index!
   ...: # The dtype does not matter try str or np.int64 same results.
   ...: df = pd.DataFrame(np.random.randint(5, size=(NUM_ROWS,NUM_COLS)), dtype=np.int64, columns=col_names)
   ...:

In [9]: df = df.set_index(index_cols).sort_index()

...

In [79]: df
Out[79]: <class 'pandas.core.frame.DataFrame'>
MultiIndex: 100000 entries, (0, 0, 0, 0, 0) to (4, 4, 4, 4, 4)
Data columns:
A5    100000  non-null values
A6    100000  non-null values
A7    100000  non-null values
A8    100000  non-null values
A9    100000  non-null values
dtypes: int64(5)

In [80]: df.ix[(0)]
Out[80]: <class 'pandas.core.frame.DataFrame'>
MultiIndex: 20011 entries, (0, 0, 0, 0) to (4, 4, 4, 4)
Data columns:
A5    20011  non-null values
A6    20011  non-null values
A7    20011  non-null values
A8    20011  non-null values
A9    20011  non-null values
dtypes: int64(5)

In [81]: df.ix[(0,1)]
Out[81]: <class 'pandas.core.frame.DataFrame'>
MultiIndex: 4007 entries, (0, 0, 0) to (4, 4, 4)
Data columns:
A5    4007  non-null values
A6    4007  non-null values
A7    4007  non-null values
A8    4007  non-null values
A9    4007  non-null values
dtypes: int64(5)

In [82]: df.ix[(0,1,2)]
Out[82]: <class 'pandas.core.frame.DataFrame'>
MultiIndex: 817 entries, (0, 0) to (4, 4)
Data columns:
A5    817  non-null values
A6    817  non-null values
A7    817  non-null values
A8    817  non-null values
A9    817  non-null values
dtypes: int64(5)

In [83]: df.ix[(0,1,2,3)]
Out[83]: <class 'pandas.core.frame.DataFrame'>
Int64Index: 162 entries, 0 to 4
Data columns:
A5    162  non-null values
A6    162  non-null values
A7    162  non-null values
A8    162  non-null values
A9    162  non-null values
dtypes: int64(5)

In [84]: df.ix[(0,1,2,3,4)]
Out[84]:                 A5  A6  A7  A8  A9
A0 A1 A2 A3 A4                    
0  1  2  3  4    1   2   2   4   2
            4    1   4   4   1   0
            4    2   1   4   1   3
            4    2   4   2   1   1
            4    1   1   2   1   4
            4    0   0   2   1   1
            4    2   0   0   3   1
            4    2   2   3   3   1
            4    3   0   3   4   1
            4    1   1   0   0   1
            4    2   1   0   2   4
            4    3   4   1   2   3
            4    0   4   3   1   0
            4    4   1   4   1   2
            4    1   3   4   3   3
            4    0   1   1   3   1
            4    2   2   2   0   3
            4    0   0   1   4   0
            4    1   0   1   4   2
            4    1   4   2   2   0
            4    4   2   0   3   1
            4    2   1   2   3   2
            4    4   2   0   1   4
            4    1   4   1   1   4
            4    1   0   1   2   4
            4    2   3   0   1   3
            4    2   1   3   3   3
            4    1   2   0   4   2
            4    3   0   4   4   0
            4    4   4   2   3   0
            4    0   0   1   3   2
            4    4   0   0   0   3
            4    2   0   3   4   2
            4    3   3   3   0   2
            4    4   2   2   0   1
            4    2   1   3   4   0

In [86]: df.index.lexsort_depth
Out[86]: 5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant