-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Add Float16-Int comparisons #29870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Float16-Int comparisons #29870
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Float64
can only exactly represent smaller integers like Int32
, which means sometimes these comparisons will throw InexactErrors (see discussion below).
What’s the largest Float16 integer?
The largest integer that can represented exactly by an IEEE 754 float is |
I forgot to say - welcome and thank you, @sam0410 🙂
Yep :) So I would say (unless @simonbyrne has a better idea) something like this should be safe and reasonably fast: ==(a::Float16, b::Union{UInt64, UInt128}) = (b <= 2048) && Float64(a) == Float64(b) @sam0410 We'd need to generalize this to the signed case, and treat signed and unsigned integers seperately. I'd suggest using e.g. |
Conversion to floating point types does not throw inexact errors, it rounds to the nearest float. Only conversion to "exact types" like integers throws inexact errors. |
It's a bit tricky but I think these are correct. The key facts are these:
where Here are the cases to consider:
|
Right you are, Stefan. My bad. I still feel |
Welcome, @sam0410 and thanks for working on this. Will be great to have this fixed. |
base/float.jl
Outdated
<=(x::Float16, y::$Ti) = Float64(x)<=Float64(y) | ||
<=(x::$Ti, y::Float16) = Float64(x)<=Float64(y) | ||
end | ||
end | ||
==(x::Float32, y::Union{Int32,UInt32}) = Float64(x)==Float64(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Float32
could be replaced with Union{Float16, Float32}
in these 6 methods to handle 32-bit ints as well.
base/float.jl
Outdated
<=(x::Float16, y::$Ti) = Float64(x)<=Float64(y) | ||
<=(x::$Ti, y::Float16) = Float64(x)<=Float64(y) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inf16 == typemax(UInt16)
is also currently incorrect. We could fix it with methods like this that promote to Float32
.
There's even a function for this:
However the largest finite |
Hi, I think I messed with the commits, I added tests, made the changes suggested by @JeffBezanson also added Union instead of eval as suggested by @andyferris. I reopened my PR in #29916 |
Hi ! I am new to Julia and this is my first contribution.
For Issue #29743