-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make pandas C parser xstrtod match float/np.float64 internal routine #2566
Comments
This is a well-known "Python problem": In [21]: from StringIO import StringIO
It varies by platform, here is using
Looks like the conversion algorithm differs in the result by one bit in the mantissa |
That was a quick response :)
|
The parsed numbers are < 1e-17 apart which is within the acceptable margin of error (typically 1e-14 or 1e-15) for double precision floating point numbers. I'm happy to make the results consistent but will require some digging to modify the new parser's C code for string to double conversion to exactly match Python's internal version. |
How can I write code which gives back '0.011277' from both floats? Should I use np.round(x,14)? |
use |
Thanks. |
The biggest issue here for me is that df.astype('i') doesn't work as before/expected, because truncation now sometimes returns a value which is smaller by 1. My ad-hoc solution is to round before truncation: np.rint(df.acolumn).astype('i'). |
Are you multiplying by a number then converting to integer? Or how does |
|
closing as not a bug |
I might be missing something here (no expert on floating-point storage), but isn't this still an issue? For highly precise values,
I guess I'm just wondering what the rationale is behind using |
@amras1 I think that is exactly the reason, speed of parsing for floats. The difference is immaterial as its below the precision of floats anyhow. |
read_csv() converts the string '0.011277' into a np.float64 whose repr() is:
'0.011276999999999999' .
However, repr(np.float64('0.011277')) returns:
'0.011277000000000001'
Also, in Pandas v.0.9.0, read_csv() produced '0.011276999999999999', while in Pandas v.10.0 read_csv() produces '0.011277000000000001'.
This problem showed up when I truncated the number (with 1e-6 precision).
System setting:
Pandas v. 10.0,
Numpy v.1.7.0b2
Windows 7
The text was updated successfully, but these errors were encountered: