Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: need to block split when filling na #14407

Open
jreback opened this issue Oct 12, 2016 · 7 comments
Open

BUG: need to block split when filling na #14407

jreback opened this issue Oct 12, 2016 · 7 comments
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@jreback
Copy link
Contributor

jreback commented Oct 12, 2016

xref #14400

In [49]: df = pd.DataFrame({'c1' : list('ABC'),
    ...:                            'c2' : list('123'),
    ...:                            'c3' : np.random.randn(3),
    ...:                            'c4' : np.arange(3)})

In [50]: df
Out[50]: 
  c1 c2        c3  c4
0  A  1  0.645628   0
1  B  2 -0.841708   1
2  C  3  0.207423   2

In [51]: df.dtypes
Out[51]: 
c1     object
c2     object
c3    float64
c4      int64
dtype: object

In [52]: df.apply(lambda x: pd.to_numeric(x, errors='coerce')).fillna(df).dtypes
Out[52]: 
c1    object
c2     int64
c3    object
c4     int64
dtype: object

In [53]: df.apply(lambda x: pd.to_numeric(x, errors='coerce')).fillna(df).apply(lambda x: pd.to_numeric(x, errors='ignore')).dtypes
Out[53]: 
c1     object
c2      int64
c3    float64
c4      int64
dtype: object

so [52] should be [53] already, but since the filled block was originally float, I don't think the splitting is happening.

@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions Difficulty Advanced labels Oct 12, 2016
@jreback jreback added this to the Next Major Release milestone Oct 12, 2016
@jbrockmendel jbrockmendel added the Internals Related to non-user accessible pandas implementation label Sep 21, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@mroeschke
Copy link
Member

This looks to work on main. Could use a test I suppose

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions Internals Related to non-user accessible pandas implementation labels Apr 26, 2023
@whendo
Copy link

whendo commented May 20, 2023

Should this test drop into tests/base/test_fillna.py or maybe pandastests/dtypes somewhere?

@srkds
Copy link
Contributor

srkds commented May 25, 2023

Should this test drop into tests/base/test_fillna.py or maybe pandastests/dtypes somewhere?

Look into #writing-tests. That might help to decide.

@nirmalmuppiri
Copy link

take

@ivonastojanovic
Copy link
Contributor

@nirmalmuppiri Are you still working on this issue?

@rishikesanr
Copy link

Issue still persists in pandas version 2.0.3

df = pd.DataFrame({
'c1': list('ABC'),
'c2': list('123'),
'c3': np.random.randn(3),
'c4': np.arange(3)
})

df.apply(lambda x: pd.to_numeric(x, errors='coerce')).fillna(df).dtypes

c1 object
c2 object
c3 float64
c4 int64
dtype: object

Despite c2 being convertible to int64, it stays as object. However, column c3 is correctly identified as float64.
It seems like the fillna(df) operation might be overriding the numeric conversion for c2.

Is someone working on this issue?

@rishikesanr
Copy link

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

No branches or pull requests

8 participants