Description
Problem
In the PR for #15486, I found that type validation for the fill_value
parameters strewn across a large number of pandas
API methods is done ad-hoc. This results in a wide variety of possible accepted inputs. I think it would be good to standardize this so that all of these methods use the same behavior, the one currently used by fillna
.
Implementation Details
Partially the point of providing a fill_value
is to avoid having to do a slow-down type conversion otherwise (using .fillna().astype()
). However, specifying other formats is nevertheless a useful convenience to have. Implementation would roughly be:
Before executing the rest of the method body, check whether or not the fill_value
is valid (using a centralized maybe_fill
method). If it is not, throw a ValueError
. If it is, check whether or not incorporating the fill_value
would result in an upcast in the column dtype.
If it would not, follow a code path where the column never gets type-converted. If it would, follow that same code path, then do something like a filla
operation at the end before returning.
Target Implementation
The same as what fillna
currently does. Which follows.
Invalid:
categorical
fill for a category not in the categories will raise aValueError
.sparse
matrices refuse upcasting.- Passing an object or list or other non-coercable "thing" as a fill.
Valid, upcast:
int
fill will promotebool
dtypes toint
.float
fill will promoteint
andbool
dtypes tofloat
(this is what happens withnp.nan
already).object
(str
) fill would promote lesser dtypes toobject
.int
,float
, andbool
fill to adatetime
dtype will be treated as a UNIX-like timestamp and promoted todatetime
.object
fill will promotedatetime
dtype toobject
.
Valid, no-cast:
- Everything else.
Current Implementation
...is ad-hoc. The following are the methods which currently provide a fill_value
input, as well as where they deviate from the model above.
-
Series.combine
,DataFrame.combine
,Series.to_sparse
: These are unique usages offill_value
which aren't compatible with the rest of them. -
Series.unstack
,DataFrame.unstack
: anyfill_value
is allowed. You can pass an object if you'd like, or even anotherDataFrame
(yo dawg...). -
DataFrame.align
: Anyfill_value
is allowed. -
DataFrame.reindex_axis
: Lists and dicts are allowed, objects are not. -
DataFrame.asfreq
,Series.asfreq
: anyfill_value
is allowed. -
pd.pivot_table
: ... -
Series.add
,DataFrame.add
: ... -
Series.subtract
,DataFrame.substract
: ... -
Probably others, there's a lot of these.