Closed
Description
Say there is a array of type int64
for convenience, that me just put is some large number
test1 = pd.Series(20150515061816532, index=list(range(500)), dtype='int64')
test1.describe()
Out[152]:
count 5.000000e+02
mean -1.674297e+16
std 0.000000e+00
min 2.015052e+16
25% 2.015052e+16
50% 2.015052e+16
75% 2.015052e+16
max 2.015052e+16
Look at the mean, it overflow, and become negative. Obviously the mean should be 20150515061816532
In [153]: test1.sum()
Out[153]: -8371486542801285616 This is wrong.
The computation should have been sum them up as float, and devided by total count.
I think we need to examine other parts of code that involve similar situation.