API: fix corner cases of lib.infer_dtype

Encountering this while working on #23167 

There a few inconsistencies in `pandas._libs.lib.infer_dtype`, e.g.
```
>>> import pandas as pd
>>> import numpy as np
>>> import pandas._libs.lib as lib
>>>
>>> lib.infer_dtype(pd.Series([], dtype=object))
'empty'
>>> lib.infer_dtype(pd.Index([], dtype=object))
'empty'
>>> lib.infer_dtype(pd.Index([]))
'empty'
>>> lib.infer_dtype(pd.Series([]))
'floating'  <--- why not empty?
```
and similarly for
```
>>> lib.infer_dtype(pd.Series([np.nan, np.nan], dtype=object), skipna=True)
'floating'  <-- wrong
>>> lib.infer_dtype(pd.Index([np.nan, np.nan], dtype=object), skipna=True)
'floating'  <-- wrong
>>> lib.infer_dtype(pd.Series([np.nan, np.nan]), skipna=True)
'floating'  <-- debatable
>>> lib.infer_dtype(pd.Index([np.nan, np.nan]), skipna=True)
'floating'  <-- debatable
```

In the context of object columns, an all-NA column with `skipna=True` should definitely not return `'floating'` (imagine a column of strings where all values happen to be missing for a given selection / after a join / whatever). I'd argue that `'empty'` all-NA for float-type should also infer to `'empty'` in case of `skipna=True`.

The `skipna` parameter was introduced in #17066 in v.0.21. As a side note, this also promised that the default will be changed from `False` to `True`. I wonder if this even needs a deprecation cycle as this is explicitly private by being in `_libs.lib`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: fix corner cases of lib.infer_dtype #23421

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: fix corner cases of lib.infer_dtype #23421

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions