Description
The DataFrame.convert_dtypes
method returns the same dataframe (with potentially updated dtypes in certain columns), but because of the internal use of concat
, it doesn't necessarily preserve the class type of the dataframe for subclasses.
Noticed this in GeoPandas: geopandas/geopandas#1870
This stems from the fact that the dataframe is basically decomposed into columns (Series), and then those are combined again into a DataFrame with concat
. But at that point, you are concatting Series objects, and concat doesn't know anymore about the original dataframe class. Personally, I would say that the use of concat
here is an implementation detail, and that convert_dtypes
could easily preserve the original class of the calling dataframe.
Small reproducer without geopandas:
class SubclassedDataFrame(DataFrame):
@property
def _constructor(self):
return SubclassedDataFrame
In [51]: df = SubclassedDataFrame({'a': [1, 2, 3]})
In [52]: type(df)
Out[52]: __main__.SubclassedDataFrame
In [53]: type(df.convert_dtypes())
Out[53]: pandas.core.frame.DataFrame
Note I am not using pd._testing.SubclassedDataFrame
since for this subclass, each column is also a SubclassesSeries, and then concat
will actually preserve the subclass.