Skip to content

DOC/BUG: pivot_table returns Series in specific circumstance #4386

Closed
@davidshinn

Description

@davidshinn

The docstrings and other documentation say that the pivot_table function returns a DataFrame. However, this likely leads to confusion like #4371, because under narrow circumstances, passing a certain set of argument dtypes results in the function returning a Series (see ipython examples at end):

  1. values is single string (not a list, not even a single valued list)
  2. cols=None
  3. aggfunc is single string/function (not a list, not even a single valued list)

Unfortunately, this is not clear from the docs or from normal use (except for condition 1).

Should this:

  1. eventually be fixed to only return a DataFrame no matter the circumstances to be less confusing
  2. be documented correctly (seems a little difficult to convey in the docstring and other docs without a lot of bulk).

My thoughts are changing the function to return only a DataFrame in future versions (> 0.13) and providing some deprecation warning in the meantime is better than trying to explain this in the docs.

I would be happy to provide the deprecation warning and document notes as a pull request.

Thanks.

Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) 
Type "copyright", "credits" or "license" for more information.

IPython 1.0.dev -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
%guiref   -> A brief reference about the graphical user interface.

In [1]: import pandas as pd
   ...: import numpy as np
   ...: pd.__version__
   ...: 
Out[1]: '0.12.0-57-g7bf2a7d'

In [2]: df = pd.DataFrame({'col1': [3, 4, 5], 'col2': ['C', 'D', 'E'], 'col3': [1, 3, 9]})

In [3]: df
Out[3]: 
   col1 col2  col3
0     3    C     1
1     4    D     3
2     5    E     9

In [4]: # Case 1: (a) values is single string label (b) cols is unspecified
   ...: #         (c) aggfunc is single lable/function (not a list)
   ...: # Expect: Series type
   ...: pivoted_1 = df.pivot_table('col1', rows=['col3', 'col2'], aggfunc=np.sum)
   ...: print pivoted_1
   ...: print type(pivoted_1)
col3  col2
1     C       3
3     D       4
9     E       5
Name: col1, dtype: int64
<class 'pandas.core.series.Series'>

In [5]: # Case 2: (a) values is single string label (b) cols is single string label
   ...: # Expected: DataFrame
   ...: pivoted_2 = df.pivot_table('col1', rows='col3', cols='col2', aggfunc=np.sum)
   ...: print pivoted_2
   ...: print type(pivoted_2)
col2   C   D   E
col3            
1      3 NaN NaN
3    NaN   4 NaN
9    NaN NaN   5
<class 'pandas.core.frame.DataFrame'>

In [6]: # Case 3: (a) values is single string label (b) cols is unspecified
   ...: #         (c) aggfunc is a list
   ...: # Expect: DataFrame
   ...: pivoted_3 = df.pivot_table('col1', rows='col3', aggfunc=[np.sum])
   ...: print pivoted_3
   ...: print type(pivoted_3)
      sum
col3     
1       3
3       4
9       5
<class 'pandas.core.frame.DataFrame'>

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions