Closed
Description
Encountering this while working on #23167
There a few inconsistencies in pandas._libs.lib.infer_dtype
, e.g.
>>> import pandas as pd
>>> import numpy as np
>>> import pandas._libs.lib as lib
>>>
>>> lib.infer_dtype(pd.Series([], dtype=object))
'empty'
>>> lib.infer_dtype(pd.Index([], dtype=object))
'empty'
>>> lib.infer_dtype(pd.Index([]))
'empty'
>>> lib.infer_dtype(pd.Series([]))
'floating' <--- why not empty?
and similarly for
>>> lib.infer_dtype(pd.Series([np.nan, np.nan], dtype=object), skipna=True)
'floating' <-- wrong
>>> lib.infer_dtype(pd.Index([np.nan, np.nan], dtype=object), skipna=True)
'floating' <-- wrong
>>> lib.infer_dtype(pd.Series([np.nan, np.nan]), skipna=True)
'floating' <-- debatable
>>> lib.infer_dtype(pd.Index([np.nan, np.nan]), skipna=True)
'floating' <-- debatable
In the context of object columns, an all-NA column with skipna=True
should definitely not return 'floating'
(imagine a column of strings where all values happen to be missing for a given selection / after a join / whatever). I'd argue that 'empty'
all-NA for float-type should also infer to 'empty'
in case of skipna=True
.
The skipna
parameter was introduced in #17066 in v.0.21. As a side note, this also promised that the default will be changed from False
to True
. I wonder if this even needs a deprecation cycle as this is explicitly private by being in _libs.lib
.