Skip to content

BUG: support decimal keyword for Float64Dtype in read_csv #52086

Open
@RobbertDM

Description

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import io
pd.read_csv(io.StringIO('id\n"1,5"\n'), dtype={'id':pd.Float64Dtype()}, sep=';', decimal=',')

or

import pandas as pd
pd.read_csv("./test.csv", dtype={'id':pd.Float64Dtype()}, sep=';', decimal=',')

where test.csv looks like

id
1,5

Issue Description

I have a semicolon-separated CSV file with a bunch of floats that have decimal separation with comma.
For that setup, when I specify dtype for that column as Float64Dtype(), it fails.

When I specify float from python, it works.
When I use and specify . as a decimal separator, it works.

Expected Behavior

The listed example should parse and result in a float 1.5

Installed Versions

/home/robbert/.local/lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit : 2e218d1
python : 3.10.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.19.0-35-generic
Version : #36~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 17 15:17:25 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.3
numpy : 1.23.5
pytz : 2022.1
dateutil : 2.8.1
setuptools : 65.5.1
pip : 23.0.1
Cython : None
pytest : 7.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : 2.9.5
jinja2 : 3.1.2
IPython : 8.6.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.1
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : 1.4.46
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : 2022.7

Metadata

Assignees

Labels

BugDtype ConversionsUnexpected or buggy dtype conversionsIO CSVread_csv, to_csvNA - MaskedArraysRelated to pd.NA and nullable extension arrays

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions