Skip to content

groupby with multiindex behaves differently for series and single-column dataframe #9703

Closed
@mcwitt

Description

@mcwitt

This behavior seems strange to me. Starting with a multiindex dataframe with unbalanced levels:

In [3]: pd.__version__
Out[3]: '0.15.2'

In [4]: df = pd.read_csv(StringIO('''
A,B,x
three,a,1
three,b,2
three,c,3
two,a,4
two,b,5
one,a,6'''), index_col=list('AB'))

In [5]: df
Out[5]: 
         x
A     B   
three a  1
      b  2
      c  3
two   a  4
      b  5
one   a  6

Groupby aggregations on this dataframe seem to revert the index to the cross product of the levels, potentially leaving many NAs in the result:

In [6]: df.groupby(level=[0,1]).mean()
Out[6]: 
          x
A     B    
one   a   6
      b NaN
      c NaN
three a   1
      b   2
      c   3
two   a   4
      b   5
      c NaN

But with the series this doesn't occur (no NAs in the result):

In [7]: df.x.groupby(level=[0,1]).mean()
Out[7]: 
A      B
one    a    6
three  a    1
       b    2
       c    3
two    a    4
       b    5
Name: x, dtype: int64

I'm wondering if this is a bug or intended behavior? I haven't been able to find any mention of different behavior in the docs.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions