Skip to content

Pivot table margins brittleness #3334

Closed
@wesm

Description

I'm trying to find the most efficient way to tabulate responses to a survey using pandas. Here's a sample survey:

df = DataFrame({'Response' : ['Y', 'N' ,'N', 'Y', 'Y', 'N'],
                'Type' : ['A', 'A', 'B', 'B', 'B', 'C']})

I want to count the number of responses of each value (Y or N), with the different types across the top.

  1. A simple way to do this is
df.groupby('Type')['Response'].value_counts().unstack(level=0)

Type     A      B        C
N       1       1       1
Y       1       2       NaN

This is almost what I want, except I'd like row and column totals as well. Of course, I could calculate these manually, but that seems ugly. What I'd really like to do is create a pivot table with margins=True, but this isn't quite as easy as I thought it would be.

  1. Again without totals, I can use
    pivot_table(df, rows='Response',cols='Type',aggfunc=len)

This produces almost the same output as above, though now the index has a name:

Type            A       B       C
Response
N                       1       1        1
Y                       1       2       NaN

So far so good, but adding margins=True results in an error.

  1. To get exactly what I want, the following works, but it seems there should be a nicer way:
df['Count'] = 1
pivot_table(df, rows='Response',cols='Type',values='Count',aggfunc=len,margins=True)     # Can also use sum

I have three questions:

  1. Why does adding 'margins=True' in reindex_like function #2 above result in an error, while it works in Binary operations on int DataMatrix #3? I'm not sure I understand what's happening here.

  2. If this behavior is intentional, is there a better way to get the output I'm looking for, without introducing an extra, unwanted column?

  3. How do I remove the name of the index from a dataframe (to get the output from reindex_like function #2 to look the same as in Enable element-wise comparison operations in DataMatrix objects #1)?

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions