-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assigning scalars to nullable dataframe columns reverts to default dtypes. #31763
Comments
assign can't take additional kwargs. All the keywords are new column names. We should be able to fix |
this is as expected now; a single string is inferred to object type; so you have to be explicit about the dtypes -1 on making this more complex with options this should work (though i am sure if we actually broadcast this) df[‘A’] = pd.array([‘test’, dtype=‘string’) |
Don't we have logic about whether an existing block can hold a new value? Or does that not work for scalars? |
no, when u assign a new column he dtype is completely new however with .loc it does consider (though i think we may have changed this) |
@jreback I tried |
I thought as well (or at least I seem to remember discussion about this), but is seems
That would have been a nice workaround otherwise .. |
yeah we changed this a while back to make .loc[:] behave the same as setitem |
If you specify explicitly the full slice (with start and/or end) instead of the implicit full slice ( In [25]: df = pd.DataFrame({'A': [0.1,0.2,0.3]})
In [26]: df.dtypes
Out[26]:
A float64
dtype: object
# df.iloc[0:, 0] also works
In [27]: df.iloc[0:-1, 0] = 1
In [28]: df.dtypes
Out[28]:
A float64
dtype: object Or with loc if you know the labels (or could do something like
(BTW, in retrospect, such corner cases seem to indicate to me that the change of letting Now, this doesn't work yet for strings, as there is a bug in setitem .. -> #31772 |
Hi, I'm posting this here because it seems related. import pandas as pd
import numpy as np
not_preserved = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list("ABC"))
not_preserved['F'] = np.random.random(8)
print("Mixed datatype DataFrame")
print(not_preserved.dtypes)
print("Single value loc: ")
print(not_preserved.loc[0, ['A', 'B', 'C']].dtypes)
print("Single value through slice: ")
print(not_preserved.loc[0:0, ['A', 'B', 'C']].dtypes)
# This prints:
# Mixed datatype DataFrame
# A int64
# B int64
# C int64
# F float64
# dtype: object
# Single value loc:
# float64
# Single value through slice:
# A int64
# B int64
# C int64
# dtype: object Should I open an issue, or is this an different manifestation of the problem discussed here? |
It is true also in 0.24.1, it seems that what changed was the behavior of the |
@glyg I suggest opening up a new issue. |
Code Sample, a copy-pastable example if possible
Problem description
Because the new nullable datatypes are not yet the default, it's very tricky to convince dtypes to stay as nullable dtypes. Helpful shortcuts for assigning repeating values to a dataframe column infer the default types, even if the column is currently a nullable type, and the data being added can be represented in the same nullable type. I'm not sure if it's best just to wait until the nullable dtypes become the default, and live with the workarounds shown above for now, or if perhaps another keyword could be added to
df.assign()
to specify the intended dtype of a column.Alternatively, is there any scope for adding dtype information to scalars?
Something like:
The text was updated successfully, but these errors were encountered: