Skip to content

Mean overflows for integer dtypes #10155

Closed
@su9204

Description

@su9204

Say there is a array of type int64
for convenience, that me just put is some large number

test1 = pd.Series(20150515061816532, index=list(range(500)), dtype='int64')
test1.describe()
Out[152]:
count 5.000000e+02
mean -1.674297e+16
std 0.000000e+00
min 2.015052e+16
25% 2.015052e+16
50% 2.015052e+16
75% 2.015052e+16
max 2.015052e+16

Look at the mean, it overflow, and become negative. Obviously the mean should be 20150515061816532

In [153]: test1.sum()
Out[153]: -8371486542801285616 This is wrong.

The computation should have been sum them up as float, and devided by total count.

I think we need to examine other parts of code that involve similar situation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions