Description
Code sample
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame
dates = pd.date_range('2019', freq='H', periods=1000)
my_df = MyDataFrame(np.arange(len(dates)), index=dates)
print(type(my_df))
# __main__.MyDataFrame (✓)
print(type(my_df.diff()))
# __main__.MyDataFrame (✓)
print(type(my_df.sample(1)))
# __main__.MyDataFrame (✓)
print(type(my_df.rolling('5H').mean()))
# __main__.MyDataFrame (✓)
print(type(my_df.groupby(my_df.index.dayofweek).mean()))
# pandas.core.frame.DataFrame (✘)
print(type(my_df.resample('D').mean()))
# pandas.core.frame.DataFrame (✘)
Problem description
Originally posted on SO.
The intended behaviour for chain-able methods on subclassed data structures is clearly that the operation returns an instance of the subclass (i.e. MyDataFrame
), rather than the native type (i.e. DataFrame
). This is the current behaviour for most operations (e.g., slicing, sampling, sorting) but not resample and groupby.
Currently groupby and resample both return explicitly constructed pandas datatypes, e.g. here:
pandas/pandas/core/groupby/generic.py
Line 338 in ac69333
To get the expected behaviour, the intermediary classes (e.g. DataFrameGroupBy) would need to retain information about the calling class so that the appropriate constructor can be used (i.e. one of _constructor
or _constructor_sliced
or _constructor_expanddim
).
Note that operations that use Window and Rolling already appear have the expected behaviour because these assemble their results via a call to concat
such as this one:
Line 325 in 171c716
Output of pd.show_versions()
pandas : 0.25.1
numpy : 1.17.1
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : 3.5.2
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None