Skip to content

BUG: Behaviour of sum/mean on sparse boolean arrays changed between 1.5.3 and pandas 2.2 #58015

Open
@CompRhys

Description

@CompRhys

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import pandas as pd
>>> import numpy as np
>>> pd.__version__
'2.2.1'
>>> a = pd.DataFrame(np.random.randint(2, size=(3,4))).astype(pd.SparseDtype(int, fill_value=0))
>>> a
   0  1  2  3
0  0  0  1  0
1  0  1  0  1
2  0  1  1  0
dtype: Sparse[int64, 0]
>>> (a>0).sum(axis=1)
0    True
1    True
2    True
dtype: Sparse[bool, False]
>>> b = pd.DataFrame(np.random.randint(2, size=(3,4)))
>>> (b>0).sum(axis=1)
0    3
1    4
2    2
dtype: int64
>>> import pandas as pd
>>> import numpy as np
>>> pd.__version__
'1.5.3'
>>> a = pd.DataFrame(np.random.randint(2, size=(3,4))).astype(pd.SparseDtype(int, fill_value=0))
>>> a
   0  1  2  3
0  1  1  0  0
1  0  0  1  0
2  0  0  1  1
>>> (a>0).sum(axis=1)
0    2
1    1
2    2
dtype: int64
>>> b = pd.DataFrame(np.random.randint(2, size=(3,4)))
>>> (b>0).sum(axis=1)
0    1
1    4
2    1
dtype: int64


### Issue Description

The sum of a sparse boolean array is sparse boolean rather than int.

### Expected Behavior

I would expect the sum of a sparse boolean array to be an int in order to match the behavior on a dense array.

### Installed Versions

this issue is observed swapping from 1.5.3 to 2.2.1

Metadata

Metadata

Assignees

Labels

BugDtype ConversionsUnexpected or buggy dtype conversionsExtensionArrayExtending pandas with custom dtypes or arrays.Reduction Operationssum, mean, min, max, etc.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions