Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Can't add a missing value to an int64 or Int64 column without it being upcast inconsistently #47214

Closed
2 of 3 tasks
Xnot opened this issue Jun 3, 2022 · 5 comments
Closed
2 of 3 tasks
Labels
Bug Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@Xnot
Copy link
Contributor

Xnot commented Jun 3, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd

# gets upcast to object
df = pd.DataFrame({"a": [1, 2, 3]}, dtype="int64")
df.loc[4] = pd.NA

# gets upcast to float64
df = pd.DataFrame({"a": [1, 2, 3]}, dtype="int64")
df.loc[4] = np.NaN

# gets upcast to object
df = pd.DataFrame({"a": [1, 2, 3]}, dtype="Int64")
df.loc[4] = pd.NA

# gets upcast to Float64
df = pd.DataFrame({"a": [1, 2, 3]}, dtype="Int64")
df.loc[4] = np.NaN

# can hold a missing value when initialized with it and remain Int64
df = pd.DataFrame({"a": [1, 2, 3, pd.NA]}, dtype="Int64")
# then gets upcast anyway when you add a second missing value
df.loc[4] = pd.NA

Issue Description

Int64 can hold missing values, however when adding a missing values to an int64 or Int64 column, it gets upcast to float64, Float64, or object.

Expected Behavior

int64 should be upcast to Int64 and Int64 should not be upcast at all.

Installed Versions

1.4.2

@Xnot Xnot added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 3, 2022
@rhshadrach
Copy link
Member

rhshadrach commented Jun 4, 2022

Thanks for the report! Certainly 1, 3, and 5 look like definitive bugs to me. For the other two:

@rhshadrach
Copy link
Member

For 1, my thinking here is that if the user encounters pd.NA, then they are working the nullable dtypes and so upcasting to them is okay.

@rhshadrach rhshadrach added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 4, 2022
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Jun 4, 2022
@phofl
Copy link
Member

phofl commented Jun 6, 2022

1 is not a but currently, we treat pd.NA as object in numpy dtypes by design as far as I am aware.

3 and 5 are bugs but have the same cause. We seem to cast to object when enlarging the DataFrame. This works as expected when overwriting an existing value

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Jun 8, 2022

3 and 5 are bugs but have the same cause.

duplicate of #32346?

(There is already some discussion there on the conversion to object dtype. if 3 and 5 are covered by that discussion, 1 and 2 are not bugs and 4 is already covered also, should probably close this issue to help keep discussion in one place)

@phofl
Copy link
Member

phofl commented Jun 8, 2022

Yes you are correct. I‘ll try to look into this.

can close here but maybe copy the NA case over to add tests later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

No branches or pull requests

4 participants