Skip to content

BUG: fix convert_dtypes dropping values from sliced mixed-dtype DataFrames#64712

Open
moktamd wants to merge 2 commits intopandas-dev:mainfrom
moktamd:fix/convert-dtypes-mixed-slice
Open

BUG: fix convert_dtypes dropping values from sliced mixed-dtype DataFrames#64712
moktamd wants to merge 2 commits intopandas-dev:mainfrom
moktamd:fix/convert-dtypes-mixed-slice

Conversation

@moktamd
Copy link

@moktamd moktamd commented Mar 19, 2026

Summary

After slicing a mixed-dtype DataFrame and calling convert_dtypes(), columns converted to ExtensionArray-backed types (e.g. Arrow strings) could silently lose values.

The root cause is in Block.convert_dtypes: after self.convert() splits the original object block into type-specific blocks, the loop still references self.shape[0] and self.dtype instead of blk.shape[0] and blk.dtype. When the original block had multiple rows (one per column in the consolidated object block), the converted single-row ExtensionBlock was unnecessarily passed through _split(), which sliced the 1-D backing array as if selecting a row from a 2-D array, truncating the data.

Fix: use blk.shape[0] and blk.dtype to reference the post-conversion block rather than the pre-conversion self.

Repro from the issue:

df = pd.DataFrame(data=[[1, "a"], [2, "b"], ["c", 3]], columns=["col1", "col2"])
df = df.loc[[0, 1]].copy()
df = df.convert_dtypes()
print(df["col2"].shape)  # (1,) — should be (2,)

moktamd added 2 commits March 19, 2026 14:34
…rames

After slicing a DataFrame with mixed dtypes and calling convert_dtypes(),
columns backed by ExtensionArrays (e.g. Arrow strings) could lose values
because Block.convert_dtypes used `self.shape[0]` (the original
pre-conversion block's row count) instead of `blk.shape[0]` (the
post-conversion block's row count) when deciding whether to split.

This caused an unnecessary _split() call on single-row ExtensionBlocks,
which sliced the 1-D backing array instead of selecting a row from a 2-D
array, silently truncating the data.

Closes pandas-dev#64702
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Wrong dataframe data after convert_dtypes

1 participant