Skip to content

ENH: pivot/groupby index with nan #3729

Closed
@jreback

Description

@jreback

ENH: maybe for now just provide a warning if dropping the nan rows when pivotting...

rom ml

http://stackoverflow.com/questions/16860172/python-pandas-pivot-table-silently-drops-indices-with-nans

This is effectivly trying to groupby on a NaN, currently not allowed

In [13]: a = [['a', 'b', 12, 12, 12], ['a', nan, 12.3, 233., 12], ['b', 'a', 123.23, 123, 1], ['a', 'b', 1, 1, 1.]]

In [14]: df = DataFrame(a, columns=['a', 'b', 'c', 'd', 'e'])

In [15]: df.groupby(['a','b']).sum()
Out[15]: 
          c    d   e
a b                 
a b   13.00   13  13
b a  123.23  123   1

Workaround to fill the index with a dummy, pivot, and replace


    In [31]: df2 = df.copy()

    In [32]: df2['dummy'] = np.nan

    In [33]: df2['b'] = df2['b'].fillna('dummy')

    In [34]: df2
    Out[34]: 
       a      b       c    d   e  dummy
    0  a      b   12.00   12  12    NaN
    1  a  dummy   12.30  233  12    NaN
    2  b      a  123.23  123   1    NaN
    3  a      b    1.00    1   1    NaN

    In [35]: df2.pivot_table(rows=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum)
    Out[35]: 
       a      b       c    d   e
    0  a      b   13.00   13  13
    1  a  dummy   12.30  233  12
    2  b      a  123.23  123   1

    In [36]: df2.pivot_table(rows=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum).replace('dummy',np.nan)
    Out[36]: 
       a    b       c    d   e
    0  a    b   13.00   13  13
    1  a  NaN   12.30  233  12
    2  b    a  123.23  123   1

Metadata

Metadata

Labels

EnhancementGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateReshapingConcat, Merge/Join, Stack/Unstack, Explode

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions