Skip to content

API: fix corner cases of lib.infer_dtype #23421

Closed
@h-vetinari

Description

Encountering this while working on #23167

There a few inconsistencies in pandas._libs.lib.infer_dtype, e.g.

>>> import pandas as pd
>>> import numpy as np
>>> import pandas._libs.lib as lib
>>>
>>> lib.infer_dtype(pd.Series([], dtype=object))
'empty'
>>> lib.infer_dtype(pd.Index([], dtype=object))
'empty'
>>> lib.infer_dtype(pd.Index([]))
'empty'
>>> lib.infer_dtype(pd.Series([]))
'floating'  <--- why not empty?

and similarly for

>>> lib.infer_dtype(pd.Series([np.nan, np.nan], dtype=object), skipna=True)
'floating'  <-- wrong
>>> lib.infer_dtype(pd.Index([np.nan, np.nan], dtype=object), skipna=True)
'floating'  <-- wrong
>>> lib.infer_dtype(pd.Series([np.nan, np.nan]), skipna=True)
'floating'  <-- debatable
>>> lib.infer_dtype(pd.Index([np.nan, np.nan]), skipna=True)
'floating'  <-- debatable

In the context of object columns, an all-NA column with skipna=True should definitely not return 'floating' (imagine a column of strings where all values happen to be missing for a given selection / after a join / whatever). I'd argue that 'empty' all-NA for float-type should also infer to 'empty' in case of skipna=True.

The skipna parameter was introduced in #17066 in v.0.21. As a side note, this also promised that the default will be changed from False to True. I wonder if this even needs a deprecation cycle as this is explicitly private by being in _libs.lib.

Metadata

Assignees

No one assigned

    Labels

    Dtype ConversionsUnexpected or buggy dtype conversionsInternalsRelated to non-user accessible pandas implementation

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions