-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
minor fma cleanup #57041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minor fma cleanup #57041
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -276,6 +276,9 @@ significantly more expensive than `x*y+z`. `fma` is used to improve accuracy in | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
algorithms. See [`muladd`](@ref). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
function fma end | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
function fma_emulated(a::Float16, b::Float16, c::Float16) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Float16(muladd(Float32(a), Float32(b), Float32(c))) #don't use fma if the hardware doesn't have it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
function fma_emulated(a::Float32, b::Float32, c::Float32)::Float32 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ab = Float64(a) * b | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
res = ab+c | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@@ -348,19 +351,14 @@ function fma_emulated(a::Float64, b::Float64,c::Float64) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
s = (abs(abhi) > abs(c)) ? (abhi-r+c+ablo) : (c-r+abhi+ablo) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
return r+s | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
fma_llvm(x::Float32, y::Float32, z::Float32) = fma_float(x, y, z) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
fma_llvm(x::Float64, y::Float64, z::Float64) = fma_float(x, y, z) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
# Disable LLVM's fma if it is incorrect, e.g. because LLVM falls back | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
# onto a broken system libm; if so, use a software emulated fma | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
@assume_effects :consistent fma(x::Float32, y::Float32, z::Float32) = Core.Intrinsics.have_fma(Float32) ? fma_llvm(x,y,z) : fma_emulated(x,y,z) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
@assume_effects :consistent fma(x::Float64, y::Float64, z::Float64) = Core.Intrinsics.have_fma(Float64) ? fma_llvm(x,y,z) : fma_emulated(x,y,z) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
function fma(a::Float16, b::Float16, c::Float16) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Float16(muladd(Float32(a), Float32(b), Float32(c))) #don't use fma if the hardware doesn't have it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
@assume_effects :consistent function fma(x::T, y::T, z::T) where {T<:IEEEFloat} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Core.Intrinsics.have_fma(T) ? fma_float(x,y,z) : fma_emulated(x,y,z) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that julia/src/llvm-cpufeatures.cpp Lines 50 to 76 in 9b1ea1a
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, this may work on riscv64 with #57043, but I'm still not entirely sure about what's going on there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure, at which point this is nfc for such architectures, but that can be fixed in a separate pr. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
# This is necessary at least on 32-bit Intel Linux, since fma_llvm may | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
# This is necessary at least on 32-bit Intel Linux, since fma_float may | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
# have called glibc, and some broken glibc fma implementations don't | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
# properly restore the rounding mode | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Rounding.setrounding_raw(Float32, Rounding.JL_FE_TONEAREST) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified to
LLVM would automatically do the demotion to
float
as necessary nowadays.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ironically, on aarch64 with fp16 extension
muladd
is better thanfma
on Float16 because it doesn't force the Float16 -> Float32 -> Float16 dance:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no.
muladd
doesn't guarantee the accuracy offma
requiresThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also point out that pure f16 fma is not super useful as an operation. Most of the accelerators will do fp16 multiply with an f32 accumulator (and then potentially round back to f16 at the end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, but that's not what
Base.fma
does.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s not really true of fp16, on aarch64 it’s a true type which supports everything (with twice the throughput on SIMD), bf16 is that though