Skip to content

DOC: floating point precision on writing/reading to csv #13159

Open
@FBartlett

Description

Code Sample

x0 = 18292498239.824
df1 = pd.DataFrame({'One': x0},index=["bignum"])
df1.to_csv('repr_test.csv')
df2 = pd.DataFrame.from_csv('repr_test.csv')
df3 = pd.read_csv('repr_test.csv')
x1 = df1['One'][0]
x2 = df2['One'][0]
x3 = df3['One'][0]
fh = open('repr_test.csv','rb')
ll = fh.readlines()
x4 = float(ll[1].split(',')[1].split()[0])
print "x0 = %f; x1 = %f; Are they equal? %s" % (x0,x1,(x0 == x1))
print "x0 = %f; x2 = %f; Are they equal? %s" % (x0,x2,(x0 == x2))
print "x0 = %f; x3 = %f; Are they equal? %s" % (x0,x3,(x0 == x3))
print "x0 = %f; x4 = %f; Are they equal? %s" % (x0,x4,(x0 == x4))

Expected Output

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True
x0 = 18292498239.824001; x2 = 18292498239.824001; Are they equal? True
x0 = 18292498239.824001; x3 = 18292498239.824001; Are they equal? True
x0 = 18292498239.824001; x4 = 18292498239.824001; Are they equal? True

output of pd.show_versions()

(Note that there are two, presented side-by-side, with results underneath)

INSTALLED VERSIONS                      INSTALLED VERSIONS
------------------                      ------------------
commit: None                            commit: None
python: 2.7.5.final.0                   python: 2.7.11.final.0
python-bits: 64                         python-bits: 64
OS: Linux                               OS: Linux
OS-release: 2.6.32-431.56.1.el6.x86_64  OS-release: 2.6.32-431.56.1.el6.x86_64
machine: x86_64                         machine: x86_64
processor: x86_64                       processor: x86_64
byteorder: little                       byteorder: little
LC_ALL: None                            LC_ALL: None
LANG: en_US.UTF-8                       LANG: en_US.UTF-8

pandas: 0.15.1                          pandas: 0.18.0
nose: 1.3.4                             nose: 1.3.7
Cython: 0.21.2                          Cython: 0.23.4
numpy: 1.9.1                            numpy: 1.10.4
scipy: 0.14.0                           scipy: 0.17.0                 
statsmodels: 0.6.0                      statsmodels: 0.6.1            
IPython: 2.3.0                          IPython: 4.1.2 
sphinx: 1.2.3                           sphinx: 1.3.5  
patsy: 0.3.0                            patsy: 0.4.0   
dateutil: 2.2                           dateutil: 2.5.1
pytz: 2014.9                            pytz: 2016.2   
bottleneck: None                        bottleneck: 1.0.0
tables: 3.1.1                           tables: 3.2.2    
numexpr: 2.4                            numexpr: 2.5     
matplotlib: 1.4.2                       matplotlib: 1.5.1
openpyxl: None                          openpyxl: 2.3.2  
xlrd: 0.9.3                             xlrd: 0.9.4      
xlwt: 0.7.5                             xlwt: 1.0.0      
xlsxwriter: 0.6.3                       xlsxwriter: 0.8.4
lxml: 3.3.3                             lxml: 3.6.0      
bs4: 4.3.2                              bs4: 4.4.1       
html5lib: None                          html5lib: None   
httplib2: None                          httplib2: None   
apiclient: None                         apiclient: None  
rpy2: None                              
sqlalchemy: None                        sqlalchemy: 1.0.12                                                    
pymysql: None                           pymysql: None 
psycopg2: None                          psycopg2: None
                                        pip: 8.1.1      
                                        xarray: None    
                                        setuptools: 20.3
                                        blosc: None     
                                        jinja2: 2.8     
                                        boto: 2.39.0    

Results from left setup (0.15.1):

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True
x0 = 18292498239.824001; x2 = 18292498239.823997; Are they equal? False
x0 = 18292498239.824001; x3 = 18292498239.823997; Are they equal? False
x0 = 18292498239.824001; x4 = 18292498239.824001; Are they equal? True

Results from right setup (0.18.0):

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True
x0 = 18292498239.824001; x2 = 18292498239.799999; Are they equal? False
x0 = 18292498239.824001; x3 = 18292498239.799999; Are they equal? False
x0 = 18292498239.824001; x4 = 18292498239.799999; Are they equal? False

Expectations

I expect to be able to write a DataFrame to a csv file and later read it in to a new DataFrame such that the two DataFrames will be identical. The older version (result 0.15.1) is quite a bit better than the newer (since I can round to three decimal places to get the expected results or read from a filehandle instead of using from_csv() or read_csv()). The newer version (0.18.0) loses information, which is not acceptable.

Note that the documentation at http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.from_csv.html reads

It is preferable to use the more powerful pandas.read_csv() for most general purposes, but from_csv makes for an easy roundtrip to and from a file (the exact counterpart of to_csv), especially with a DataFrame of time series data.

But this does not describe what actually happens, as demonstrated above.

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions