Skip to content

BUG: DataFrame.convert_dtypes doesn't preserve subclasses #43668

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

The DataFrame.convert_dtypes method returns the same dataframe (with potentially updated dtypes in certain columns), but because of the internal use of concat, it doesn't necessarily preserve the class type of the dataframe for subclasses.

Noticed this in GeoPandas: geopandas/geopandas#1870

This stems from the fact that the dataframe is basically decomposed into columns (Series), and then those are combined again into a DataFrame with concat. But at that point, you are concatting Series objects, and concat doesn't know anymore about the original dataframe class. Personally, I would say that the use of concat here is an implementation detail, and that convert_dtypes could easily preserve the original class of the calling dataframe.

Small reproducer without geopandas:

class SubclassedDataFrame(DataFrame):

    @property
    def _constructor(self):
        return SubclassedDataFrame


In [51]: df = SubclassedDataFrame({'a': [1, 2, 3]})

In [52]: type(df)
Out[52]: __main__.SubclassedDataFrame

In [53]: type(df.convert_dtypes())
Out[53]: pandas.core.frame.DataFrame

Note I am not using pd._testing.SubclassedDataFrame since for this subclass, each column is also a SubclassesSeries, and then concat will actually preserve the subclass.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions