Skip to content

Add Float16-Int comparisons #29870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

sam0410
Copy link
Contributor

@sam0410 sam0410 commented Oct 31, 2018

Hi ! I am new to Julia and this is my first contribution.
For Issue #29743

@KristofferC KristofferC added the needs tests Unit tests are required for this change label Oct 31, 2018
@ararslan ararslan requested a review from simonbyrne November 1, 2018 05:44
@ararslan ararslan added the maths Mathematical functions label Nov 1, 2018
Copy link
Member

@andyferris andyferris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float64 can only exactly represent smaller integers like Int32, which means sometimes these comparisons will throw InexactErrors (see discussion below).

What’s the largest Float16 integer?

@pkofod
Copy link
Contributor

pkofod commented Nov 1, 2018

What’s the largest Float16 integer

The largest integer that can represented exactly by an IEEE 754 float is 2^(mantissa bits+1), so it should be 2048.

@andyferris
Copy link
Member

Hi ! I am new to Julia and this is my first contribution.

I forgot to say - welcome and thank you, @sam0410 🙂

The largest integer that can represented exactly by an IEEE 754 float is 2^(mantissa bits+1), so it should be 2048.

Yep :) So I would say (unless @simonbyrne has a better idea) something like this should be safe and reasonably fast:

==(a::Float16, b::Union{UInt64, UInt128}) = (b <= 2048) && Float64(a) == Float64(b)

@sam0410 We'd need to generalize this to the signed case, and treat signed and unsigned integers seperately. I'd suggest using e.g. Union{UInt64, UInt128} in the function signature instead of the for loop code generator. Do you feel up for that?

@StefanKarpinski
Copy link
Member

Float64 can only exactly represent smaller integers like Int32, which means sometimes these comparisons will throw InexactErrors.

Conversion to floating point types does not throw inexact errors, it rounds to the nearest float. Only conversion to "exact types" like integers throws inexact errors.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Nov 1, 2018

It's a bit tricky but I think these are correct. The key facts are these:

  • Float64(f) is always exact
  • Float64(i) may lose precision, but
    • only for i ∉ -2^53:2^53
    • is always finite (since typemax(UInt128) ≤ floatmax(Float64))

where f is the Float16 argument and i is the the integer argument.

Here are the cases to consider:

  • f is non-finite:
    • correct since Float64(i) is always finite
  • f is finite:
    • i < -2^53: smaller than all possible f
      • i.e. i < f is true
      • Float64(i) < Float64(f) will also be true
    • i ∈ -2^53:2^53: comparisons is exact
    • i > 2^53: larger than all possible f
      • i.e. i > f is true
      • Float64(i) > Float64(f) will also be true

@andyferris
Copy link
Member

andyferris commented Nov 1, 2018

Right you are, Stefan. My bad.

I still feel @eval is overkill here when a Union{...} type suffices, but other than that LGTM.

@JeffBezanson
Copy link
Member

Welcome, @sam0410 and thanks for working on this. Will be great to have this fixed.

base/float.jl Outdated
<=(x::Float16, y::$Ti) = Float64(x)<=Float64(y)
<=(x::$Ti, y::Float16) = Float64(x)<=Float64(y)
end
end
==(x::Float32, y::Union{Int32,UInt32}) = Float64(x)==Float64(y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float32 could be replaced with Union{Float16, Float32} in these 6 methods to handle 32-bit ints as well.

base/float.jl Outdated
<=(x::Float16, y::$Ti) = Float64(x)<=Float64(y)
<=(x::$Ti, y::Float16) = Float64(x)<=Float64(y)
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inf16 == typemax(UInt16) is also currently incorrect. We could fix it with methods like this that promote to Float32.

@simonbyrne
Copy link
Member

simonbyrne commented Nov 2, 2018

What’s the largest Float16 integer

The largest integer that can represented exactly by an IEEE 754 float is 2^(mantissa bits+1), so it should be 2048.

There's even a function for this:

julia> maxintfloat(Float16)
Float16(2048.0)

However the largest finite Float16 value is only 65504.0 (prevfloat(Float16(Inf))), so converting everything to Float64 should be fine (since any integer which aren't representable by Float64 are much larger than this).

@sam0410 sam0410 closed this Nov 3, 2018
@sam0410 sam0410 reopened this Nov 3, 2018
@sam0410 sam0410 mentioned this pull request Nov 3, 2018
@sam0410
Copy link
Contributor Author

sam0410 commented Nov 3, 2018

Hi, I think I messed with the commits, I added tests, made the changes suggested by @JeffBezanson also added Union instead of eval as suggested by @andyferris. I reopened my PR in #29916

@sam0410 sam0410 closed this Nov 3, 2018
@sam0410 sam0410 deleted the float16-int-comparison branch November 3, 2018 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions needs tests Unit tests are required for this change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants