Closed
Description
hi folks,
A Python for Data Analysis reader noted the following issue with recent versions of pandas (as of 1 year ago):
import pandas as pd
from pandas import DataFrame
import numpy as np
def demean(arr):
return arr - arr.mean()
people = DataFrame(np.random.randn(5, 5),
columns=['a', 'b', 'c', 'd', 'e'],
index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
key = ['one', 'two', 'one', 'two', 'one']
on pandas 0.14.1:
In [14]: people.groupby(key).transform(demean).groupby(key).mean()
Out[14]:
a b c d e
one -0.228006 0.246737 0.201117 0.250544 0.273858
two 0.342009 -0.370106 -0.301676 -0.375816 -0.410788
on the other hand:
In [15]: people.groupby(key).apply(demean).groupby(key).mean()
Out[15]:
a b c d e
one -3.700743e-17 7.401487e-17 -7.401487e-17 7.401487e-17 0.000000e+00
two 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 5.551115e-17
Looks like transform
has undergone some work in recent times; any ideas? I need to look at the book text and see if I can triage by replacing transform
with apply
. At this point transform
feels a little bit anachronistic.