-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Open
Labels
BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressionsResampleresample methodresample method
Description
from datetime import datetime
from pandas import DataFrame
import numpy as np
max_int = np.iinfo(np.int64).max
min_int = np.iinfo(np.int64).min
df = DataFrame([max_int, min_int], index=[datetime(2013, 1, 1), datetime(2013, 1, 1)])
assert df.resample("M").apply(np.sum)[0][0] == -1
...
AssertionErrorThe assertion error occurs because during the aggregation, pandas checks in cython_operation in core/groupby.py via _is_cython_func from core/base.py whether there are any "missing" integer values (assuming the data is integer) before and after the aggregation, which are defined as iNaT = -9223372036854775808. If there are any such values, we automatically cast the data to float.
This logic is quite prevalent in the codebase, but it does seem quite fraught with pitfalls. For example, what if the output of a computation got the value -9223372036854775808 ? Also, what if the user intended to use -9223372036854775808 as a legitimate data point?
Unlikely, sure. But reasonable, absolutely.
Metadata
Metadata
Assignees
Labels
BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressionsResampleresample methodresample method