Skip to content

fix spurious overflow for Float16(::Rational) #52395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 7, 2024
Merged

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Dec 5, 2023

Fixes #52394.

Update: also fixes Float32 for UInt128, since currently Float32((typemax(UInt128)-0x01) // typemax(UInt128)) gives Nan32.

@stevengj stevengj added bugfix This change fixes an existing bug float16 labels Dec 5, 2023
@giordano giordano added the rationals The Rational type and values thereof label Dec 5, 2023
@stevengj stevengj force-pushed the rational_to_float16 branch from 3c40acc to 4270c36 Compare December 13, 2023 20:58
@stevengj stevengj force-pushed the rational_to_float16 branch from 4270c36 to c34a75d Compare December 14, 2023 14:10
nsajko added a commit to nsajko/julia that referenced this pull request Jan 13, 2024
Implementation approach:

1. Convert the (numerator, denominator) pair to a (sign bit, integral
   significand, exponent) triplet using integer arithmetic. The integer
   type in question must be wide enough.

2. Convert the above triplet into an instance of the chosen FP type.
   There is special support for IEEE 754 floating-point and for
   `BigFloat`, otherwise a fallback using `ldexp` is used.

As a bonus, constructing a `BigFloat` from a `Rational` should now be
thread-safe when the rounding mode and precision are provided to the
constructor, because there is no access to the global precision or
rounding mode settings.

Updates JuliaLang#45213

Updates JuliaLang#50940

Updates JuliaLang#52507

Fixes JuliaLang#52394

Closes JuliaLang#52395

Fixes JuliaLang#52859
nsajko added a commit to nsajko/julia that referenced this pull request Jan 14, 2024
Implementation approach:

1. Convert the (numerator, denominator) pair to a (sign bit, integral
   significand, exponent) triplet using integer arithmetic. The integer
   type in question must be wide enough.

2. Convert the above triplet into an instance of the chosen FP type.
   There is special support for IEEE 754 floating-point and for
   `BigFloat`, otherwise a fallback using `ldexp` is used.

As a bonus, constructing a `BigFloat` from a `Rational` should now be
thread-safe when the rounding mode and precision are provided to the
constructor, because there is no access to the global precision or
rounding mode settings.

Updates JuliaLang#45213

Updates JuliaLang#50940

Updates JuliaLang#52507

Fixes JuliaLang#52394

Closes JuliaLang#52395

Fixes JuliaLang#52859
nsajko added a commit to nsajko/julia that referenced this pull request Jan 14, 2024
Implementation approach:

1. Convert the (numerator, denominator) pair to a (sign bit, integral
   significand, exponent) triplet using integer arithmetic. The integer
   type in question must be wide enough.

2. Convert the above triplet into an instance of the chosen FP type.
   There is special support for IEEE 754 floating-point and for
   `BigFloat`, otherwise a fallback using `ldexp` is used.

As a bonus, constructing a `BigFloat` from a `Rational` should now be
thread-safe when the rounding mode and precision are provided to the
constructor, because there is no access to the global precision or
rounding mode settings.

Updates JuliaLang#45213

Updates JuliaLang#50940

Updates JuliaLang#52507

Fixes JuliaLang#52394

Closes JuliaLang#52395

Fixes JuliaLang#52859
nsajko added a commit to nsajko/julia that referenced this pull request Jan 14, 2024
Implementation approach:

1. Convert the (numerator, denominator) pair to a (sign bit, integral
   significand, exponent) triplet using integer arithmetic. The integer
   type in question must be wide enough.

2. Convert the above triplet into an instance of the chosen FP type.
   There is special support for IEEE 754 floating-point and for
   `BigFloat`, otherwise a fallback using `ldexp` is used.

As a bonus, constructing a `BigFloat` from a `Rational` should now be
thread-safe when the rounding mode and precision are provided to the
constructor, because there is no access to the global precision or
rounding mode settings.

Updates JuliaLang#45213

Updates JuliaLang#50940

Updates JuliaLang#52507

Fixes JuliaLang#52394

Closes JuliaLang#52395

Fixes JuliaLang#52859
nsajko added a commit to nsajko/julia that referenced this pull request Jan 14, 2024
Implementation approach:

1. Convert the (numerator, denominator) pair to a (sign bit, integral
   significand, exponent) triplet using integer arithmetic. The integer
   type in question must be wide enough.

2. Convert the above triplet into an instance of the chosen FP type.
   There is special support for IEEE 754 floating-point and for
   `BigFloat`, otherwise a fallback using `ldexp` is used.

As a bonus, constructing a `BigFloat` from a `Rational` should now be
thread-safe when the rounding mode and precision are provided to the
constructor, because there is no access to the global precision or
rounding mode settings.

Updates JuliaLang#45213

Updates JuliaLang#50940

Updates JuliaLang#52507

Fixes JuliaLang#52394

Closes JuliaLang#52395

Fixes JuliaLang#52859
nsajko added a commit to nsajko/julia that referenced this pull request Jan 15, 2024
Implementation approach:

1. Convert the (numerator, denominator) pair to a (sign bit, integral
   significand, exponent) triplet using integer arithmetic. The integer
   type in question must be wide enough.

2. Convert the above triplet into an instance of the chosen FP type.
   There is special support for IEEE 754 floating-point and for
   `BigFloat`, otherwise a fallback using `ldexp` is used.

As a bonus, constructing a `BigFloat` from a `Rational` should now be
thread-safe when the rounding mode and precision are provided to the
constructor, because there is no access to the global precision or
rounding mode settings.

Updates JuliaLang#45213

Updates JuliaLang#50940

Updates JuliaLang#52507

Fixes JuliaLang#52394

Closes JuliaLang#52395

Fixes JuliaLang#52859
nsajko added a commit to nsajko/julia that referenced this pull request Jan 15, 2024
Constructing a floating-point number from a `Rational` should now be
correctly rounded.

Implementation approach:

1. Convert the (numerator, denominator) pair to a (sign bit, integral
   significand, exponent) triplet using integer arithmetic. The integer
   type in question must be wide enough.

2. Convert the above triplet into an instance of the chosen FP type.
   There is special support for IEEE 754 floating-point and for
   `BigFloat`, otherwise a fallback using `ldexp` is used.

As a bonus, constructing a `BigFloat` from a `Rational` should now be
thread-safe when the rounding mode and precision are provided to the
constructor, because there is no access to the global precision or
rounding mode settings.

Updates JuliaLang#45213

Updates JuliaLang#50940

Updates JuliaLang#52507

Fixes JuliaLang#52394

Closes JuliaLang#52395

Fixes JuliaLang#52859
nsajko added a commit to nsajko/julia that referenced this pull request Jan 15, 2024
Constructing a floating-point number from a `Rational` should now be
correctly rounded.

Implementation approach:

1. Convert the (numerator, denominator) pair to a (sign bit, integral
   significand, exponent) triplet using integer arithmetic. The integer
   type in question must be wide enough.

2. Convert the above triplet into an instance of the chosen FP type.
   There is special support for IEEE 754 floating-point and for
   `BigFloat`, otherwise a fallback using `ldexp` is used.

As a bonus, constructing a `BigFloat` from a `Rational` should now be
thread-safe when the rounding mode and precision are provided to the
constructor, because there is no access to the global precision or
rounding mode settings.

Updates JuliaLang#45213

Updates JuliaLang#50940

Updates JuliaLang#52507

Fixes JuliaLang#52394

Closes JuliaLang#52395

Fixes JuliaLang#52859
Float16(x::Rational{<:Union{Int128,UInt128}}) =
Float16(Float64(x)) # UInt128 overflows Float32, include Int128 for consistency
Float32(x::Rational{<:Union{Int128,UInt128}}) =
Float32(Float64(x)) # UInt128 overflows Float32, include Int128 for consistency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't these suffer from double-rounding accuracy loss? @oscardssmith

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they would. Doing this properly is fairly hard though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth noting that Float16(num)/Float16(den) also potentially has triple rounding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question is: is this problem worse than the problem this PR fixes? I.e. should we merge and improve this later, or should this wait until this PR does it "right" (whatever that means?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should merge and improve later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my thought as well, but want to confirm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would doing it "properly" be something like:

a = Float64(x)
b = Float32(a)
# check if we double-rounded in the wrong direction
if x > a && b < a && nextfloat(b) < a
    b = nextfloat(b)
elseif x < a && b > a && prevfloat(b) > a
    b = prevfloat(b)
end
return b

Copy link
Member Author

@stevengj stevengj Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A example of a test case where double rounding gives a different result is:

julia> r = 12928845309018111//18014398509481984
12928845309018111//18014398509481984

julia> Float32(r) == Float32(Float64(r))
false

@vtjnash's code doesn't fix this, however — its output matches Float32(Float64(r)) on this example.

(Example constructed by tweaking a number almost exactly halfway between two Float32 values.)

@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Feb 7, 2024
@vtjnash vtjnash merged commit bead1d3 into master Feb 7, 2024
@vtjnash vtjnash deleted the rational_to_float16 branch February 7, 2024 14:50
@oscardssmith oscardssmith removed the merge me PR is reviewed. Merge when all tests are passing label Feb 7, 2024
@nsajko
Copy link
Contributor

nsajko commented Feb 7, 2024

Was there a point to merging this? This PR was just a quick fix compared to the thorough fix of #49749.

@oscardssmith
Copy link
Member

Primarily that it was a quick fix, compared to the thorough fix of #49749 :). To be slightly more serious, I think we should merge the more thorough fix, but this was an improvement that was ready to merge first.

nsajko added a commit to nsajko/julia that referenced this pull request May 12, 2024
Constructing a floating-point number from a `Rational` should now be
correctly rounded.

Implementation approach:

1. Convert the (numerator, denominator) pair to a (sign bit, integral
   significand, exponent) triplet using integer arithmetic. The integer
   type in question must be wide enough.

2. Convert the above triplet into an instance of the chosen FP type.
   There is special support for IEEE 754 floating-point and for
   `BigFloat`, otherwise a fallback using `ldexp` is used.

As a bonus, constructing a `BigFloat` from a `Rational` should now be
thread-safe when the rounding mode and precision are provided to the
constructor, because there is no access to the global precision or
rounding mode settings.

Updates JuliaLang#45213

Updates JuliaLang#50940

Updates JuliaLang#52507

Fixes JuliaLang#52394

Closes JuliaLang#52395

Fixes JuliaLang#52859
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug float16 rationals The Rational type and values thereof
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spurious overflow in Rational-to-Float16 conversion
7 participants