Skip to content

Have pd.array infer new extension types #29791

Closed
@TomAugspurger

Description

@TomAugspurger

Currently pd.array sometimes requires an explicit dtype=... to get one of our extension arrays (we'll infer for Period, Datetime, and Interval).

This proposal is to have it infer the extension type for

  • strings -> StringArray
  • boolean -> BooleanArray
  • integer -> IntegerArray

All of these currently return PandasArray.

Concretely, we'll need to teach infer_dtype how not to infer mixed for a mix of strings / booleans and NA values, similar to how it handles integer-na

In [27]: lib.infer_dtype([True, None], skipna=False)
Out[27]: 'mixed'

In [28]: lib.infer_dtype(['a', None], skipna=False)
Out[28]: 'mixed'

In [29]: lib.infer_dtype([0, np.nan], skipna=False)
Out[29]: 'integer-na'

and then handle those in array.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignConstructorsSeries/DataFrame/Index/pd.array ConstructorsExtensionArrayExtending pandas with custom dtypes or arrays.Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions