Closed
Description
The docstrings and other documentation say that the pivot_table
function returns a DataFrame. However, this likely leads to confusion like #4371, because under narrow circumstances, passing a certain set of argument dtypes results in the function returning a Series (see ipython examples at end):
- values is single string (not a list, not even a single valued list)
- cols=None
- aggfunc is single string/function (not a list, not even a single valued list)
Unfortunately, this is not clear from the docs or from normal use (except for condition 1).
Should this:
- eventually be fixed to only return a DataFrame no matter the circumstances to be less confusing
- be documented correctly (seems a little difficult to convey in the docstring and other docs without a lot of bulk).
My thoughts are changing the function to return only a DataFrame in future versions (> 0.13) and providing some deprecation warning in the meantime is better than trying to explain this in the docs.
I would be happy to provide the deprecation warning and document notes as a pull request.
Thanks.
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
Type "copyright", "credits" or "license" for more information.
IPython 1.0.dev -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
%guiref -> A brief reference about the graphical user interface.
In [1]: import pandas as pd
...: import numpy as np
...: pd.__version__
...:
Out[1]: '0.12.0-57-g7bf2a7d'
In [2]: df = pd.DataFrame({'col1': [3, 4, 5], 'col2': ['C', 'D', 'E'], 'col3': [1, 3, 9]})
In [3]: df
Out[3]:
col1 col2 col3
0 3 C 1
1 4 D 3
2 5 E 9
In [4]: # Case 1: (a) values is single string label (b) cols is unspecified
...: # (c) aggfunc is single lable/function (not a list)
...: # Expect: Series type
...: pivoted_1 = df.pivot_table('col1', rows=['col3', 'col2'], aggfunc=np.sum)
...: print pivoted_1
...: print type(pivoted_1)
col3 col2
1 C 3
3 D 4
9 E 5
Name: col1, dtype: int64
<class 'pandas.core.series.Series'>
In [5]: # Case 2: (a) values is single string label (b) cols is single string label
...: # Expected: DataFrame
...: pivoted_2 = df.pivot_table('col1', rows='col3', cols='col2', aggfunc=np.sum)
...: print pivoted_2
...: print type(pivoted_2)
col2 C D E
col3
1 3 NaN NaN
3 NaN 4 NaN
9 NaN NaN 5
<class 'pandas.core.frame.DataFrame'>
In [6]: # Case 3: (a) values is single string label (b) cols is unspecified
...: # (c) aggfunc is a list
...: # Expect: DataFrame
...: pivoted_3 = df.pivot_table('col1', rows='col3', aggfunc=[np.sum])
...: print pivoted_3
...: print type(pivoted_3)
sum
col3
1 3
3 4
9 5
<class 'pandas.core.frame.DataFrame'>