Description
Block.where
has special downcasting logic that splits blocks differently from any other Block methods. I would like to deprecate and eventually remove this bespoke logic.
The relevant logic is only reached AFAICT when we have integer dtype (non-int64) and an integer other
too big for this dtype, AND the passed cond
has all-True
columns.
(Identifying the affected behavior is difficult in part because it relies on can_hold_element
incorrectly returning True
in these cases)
import numpy as np
import pandas as pd
arr = np.arange(6).astype(np.int16).reshape(3, 2)
df = pd.DataFrame(arr)
mask = np.zeros(arr.shape, dtype=bool)
mask[:, 0] = True
res = df.where(mask, 2**17)
>>> res.dtypes
0 int16
1 int32
dtype: object
The simplest thing to do would be to not do any downcasting in these cases, in which case we would end up with all-int32. The next simplest would be to downcast column-wise, which would give the same end result but with less consolidation.
We do not have any test cases that fail if I disable this downcasting (after I fix a problem with an expressions.where call that the downcasting somehow makes irrelevant). This makes me think the current behavior is not intentional, or at least not a priority.
Any objection to deprecating the integer downcasting entirely?