-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Column of dtype Categorical in DataFrame encounters error when taking a row that includes nan in the column #58954
Comments
Thanks for the report - confirmed on main. Further investigations and PRs to fix are welcome! |
Hey, Could you please try the following code in your machine:
|
Here is the output on my machine:
Thank you! |
Hi! I would like to attempt this issue if that's alright. :) |
Initial thoughts -
Therefore the easiest way to fix this seems to be to make any column containing a Nan into an |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
When columns are created as
pd.Categorical
, taking a row out sometimes encounter strange error, because a row is of typepd.Series
, which has to take a fixed type for all the elements. If there isnp.nan
in the row, it might throw error if the earlier column is of typeint
. Would it make sense to make the row ALWAYS take dtypeobject
, because it is very common to have mixed types as row ALWAYS spans different columns?Expected Behavior
Taking a row out of a DataFrame that has a
pd.Categorical
column should not report inconsistent error, depending on what earlier columns are present.Installed Versions
INSTALLED VERSIONS
commit : ba1cccd
python : 3.11.5.final.0
python-bits : 64
OS : Darwin
OS-release : 23.4.0
Version : Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.1.0
numpy : 1.25.2
pytz : 2023.3
dateutil : 2.8.2
setuptools : 65.5.0
pip : 24.0
Cython : None
pytest : 7.4.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.2
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: