Skip to content

API: IntegerArray reductions: data type of the scalar result? #23106

Open
@jorisvandenbossche

Description

@jorisvandenbossche

Follow-up on #22762 which added _reduce to the ExtensionArray interface and added an implementation for IntegerArray.

Currently, for that IntegerArray we special case 'sum', 'min', 'max' to make sure they return ints (because the underlying implementation can be based on a float array, depending whether there are missing values).

# if we have a preservable numeric op,
# provide coercion back to an integer type if possible
elif name in ['sum', 'min', 'max', 'prod'] and notna(result):
int_result = int(result)
if int_result == result:
result = int_result

That basic check has the following "problem":

  • Some reductions with return python integers, others will return numpy scalars. Accessing single elements also gives numpy scalars.
In [3]: pd.Series(pd.core.arrays.integer_array([1, None, 3])).mean()
Out[3]: 2.0

In [4]: type(_)
Out[4]: numpy.float64

In [5]: pd.Series(pd.core.arrays.integer_array([1, None, 3])).sum()
Out[5]: 4

In [6]: type(_)
Out[6]: int

In [7]: type(pd.Series(pd.core.arrays.integer_array([1, None, 3]))[0])
Out[7]: numpy.int64

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsExtensionArrayExtending pandas with custom dtypes or arrays.NA - MaskedArraysRelated to pd.NA and nullable extension arrays

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions