Skip to content

BUG: stack_multi_columns extension array downcast #45740

Closed
@ladidadadum

Description

@ladidadadum

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

T = pd.core.arrays.integer.Int8Dtype()

df = pd.DataFrame({
    ('a', 'x'): pd.Series(range(5), dtype=T),
    ('b', 'x'): pd.Series(range(5)),
})

dtypes = df.stack().dtypes

assert dtypes['a'] == T, dtypes['a']

Issue Description

Extension dtypes (including categoricals) are down-cast to object if they are part of a multi-column data frame with more than one label on the non-stacked level(s).

The function _stack_multi_columns contains a branch for a homogenous extension dtype frame. (https://github.com/pandas-dev/pandas/blob/4f0e22c1c6d9cebed26f45a9a5b873ebbe67cd17/pandas/core/reshape/reshape.py#L744,L745)

The task of this branch is to ensure that extension dtypes are preserved by the stacking.

There’s a bug that causes this branch to not be executed.

Expected Behavior

Expectation: Extension Dtypes should be preserved under stacking.

Installed Versions

Pandas 1.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsExtensionArrayExtending pandas with custom dtypes or arrays.ReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions