API: pd.StringDtype.value_counts should return pd.Int64Dtype #59346
Labels
API - Consistency
Internal Consistency of API/Behavior
API Design
Arrow
pyarrow functionality
Dtype Conversions
Unexpected or buggy dtype conversions
ExtensionArray
Extending pandas with custom dtypes or arrays.
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
From discussion in https://github.com/pandas-dev/pandas/pull/59330/files#r1695250192
The fact that
pd.StringDtype(storage="pyarrow")
returns aint64[pyarrow]
is a mistake with our API. While this is well intentioned for performance reasons, it introduces an inconsistency with our extension typesIf we continue to have
pd.StringDtype.value_counts
return apd.Int64Dtype()
we can leave it as an implementation detail whether or not thatInt64Dtype
is backed by pyarrow or numpy. The user interface would not need to change at all; users could simply install (or not install) pyarrow and things should continue to work the same. This would long term also align better with PDEP-13 #58455Feature Description
Make algorithms for the StringDtype that use pd.NA as the missing value sentinel consistently return other pandas extension types, and leave it as an implementation detail if those are backed by pyarrow or not
Alternative Solutions
status quo
Additional Context
No response
The text was updated successfully, but these errors were encountered: