-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Fix fptrunc Float64 -> Float16 rounding through Float32 #57809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Widening from Float32 to Float64 and then rounding (by any method) to Float16 will not introduce any error, but going from Float64 -> Float32 -> Float16 will round incorrectly if the intermediate Float32 is halfway between two Float16s. Fixes JuliaLang#57805. Co-authored-by: Jameson Nash <jameson@juliacomputing.com>
#define fintrinsic_write_float16(p, x) *(uint16_t *)p = float_to_half(x) | ||
#define fintrinsic_write_bfloat16(p, x) *(uint16_t *)p = float_to_bfloat(x) | ||
#define fintrinsic_write_float16(p, x) *(uint16_t *)p = double_to_half(x) | ||
#define fintrinsic_write_bfloat16(p, x) *(uint16_t *)p = double_to_bfloat(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the same logic applies to bfloat16. Since bfloat16 is just a truncated float32, shouldn't it always be legal to go through Float32? If not, it'd be good to have a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the double_to_bfloat code, the same correction to the rounding direction is required whenever the float32 (after truncation from float64) is exactly halfway between 2 bfloat16 values, but wasn't required for the subnormal case (as was demonstrated in the original issue for float16)
Widening from Float32 to Float64 and then rounding to Float16 will not introduce any error, but going from Float64 -> Float32 -> Float16 will round incorrectly if the intermediate Float32 is halfway between two Float16s. Fixes #57805. Thanks to @vtjnash for suggesting the fix. Co-authored-by: Jameson Nash <jameson@juliacomputing.com> (cherry picked from commit a676b12)
Widening from Float32 to Float64 and then rounding to Float16 will not introduce any error, but going from Float64 -> Float32 -> Float16 will round incorrectly if the intermediate Float32 is halfway between two Float16s.
Fixes #57805.
Thanks to @vtjnash for suggesting the fix.