Skip to content

pandas.DataFrame.sum() returns wrong type for subclassed pandas DataFrame #25596

Closed
@MaximilianHahn

Description

@MaximilianHahn

Code Sample, a copy-pastable example if possible

# the following code is obtained from the documentation
# https://pandas.pydata.org/pandas-docs/stable/development/extending.html

import pandas as pd

class SubclassedSeries(pd.Series):
    @property
    def _constructor(self):
        return SubclassedSeries
    @property
    def _constructor_expanddim(self):
        return SubclassedDataFrame


class SubclassedDataFrame(pd.DataFrame):
    @property
    def _constructor(self):
        return SubclassedDataFrame
    @property
    def _constructor_sliced(self):
        return SubclassedSeries

# create a class instance as in the example of the documentation

df = SubclassedDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> df
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

# this works just fine

>>> type(df)
<class '__main__.SubclassedDataFrame'>

# slicing also works fine

>>> sliced2 = df['A']
>>> sliced2
0    1
1    2
2    3
Name: A, dtype: int64

>>> type(sliced2)
<class '__main__.SubclassedSeries'>

# however, the sum operation returns a pandas.Series, not SubclassedSeries

>>> sliced3 = df.sum()
>>> sliced3
0    1
1    2
2    3
Name: A, dtype: int64

>>> type(sliced3)
<class 'pandas.core.series.Series'>

Problem description

In our project, we extend pandas as described in the documentation and implement our own kind of DataFrame and Series, similar to the geopandas project (if you apply sum on their DataFrame, the same problem appears). If you want to use _reduce operations like sum, it is important that the correct SubclassedSeries is returned. Otherwise, inheritance from pandas.DataFrames is not possible.

Expected Output

>>> type(sliced3)
<class '__main__.SubclassedSeries'>

I think I can provide a possible fix of this problem: The relevant code is contained in core/frame.py just before the return statement of the _reduce function:

# this is the code in core/frame.py:
def _reduce(...):
        # .... left out
        if constructor is not None:
            result = Series(result, index=labels)
        return result

# I suggest the following change:
def _reduce(...):
        # .... left out
        if constructor is None:
            result = Series(result, index=labels)
        else:
            result = constructor(result, index=labels)
        # alternative (since constructor will create a SubclassedDataFrame):
            result = self._constructor_sliced(result, index=labels)
        return result

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Numeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions