Satisfy TODO in Complex muladd #36140

jmert · 2020-06-03T22:19:19Z

This PR has reduced in scope/changed goals. The original message is shown below, but the purpose now is to simply satisfy an old TODO in the definition of muladd for Complex arguments that mentions a "future" mulsub function. The mulsub function was never added (see #15985), and modern LLVM's optimizations are sufficient to take a sequence of muladd and negations and turn it into the corresponding fused multiply-subtract where appropriate.

PR #15980 defined muladd for Complex (and various combinations of mixed Complex-Real) arguments. It seems the same could (should?) be done for fma as well.

The original TODO comments concerning a future mulsub no longer apply since LLVM appears to be capable of reconstructing the fused multiply-subtract, as already mentioned recently in #35562 (comment):

julia> code_native(fma, Tuple{ComplexF64, ComplexF64, ComplexF64}, debuginfo=:none)
        .text
        vmovsd  8(%rsi), %xmm1          # xmm1 = mem[0],zero
        vmovsd  8(%rdx), %xmm3          # xmm3 = mem[0],zero
        vmovsd  (%rcx), %xmm4           # xmm4 = mem[0],zero
        vmovsd  (%rsi), %xmm0           # xmm0 = mem[0],zero
        vmovsd  (%rdx), %xmm2           # xmm2 = mem[0],zero
        movq    %rdi, %rax
        vfmsub231sd     %xmm3, %xmm1, %xmm4 # xmm4 = (xmm1 * xmm3) - xmm4
        vfmsub231sd     %xmm2, %xmm0, %xmm4 # xmm4 = (xmm0 * xmm2) - xmm4
        vfmadd213sd     8(%rcx), %xmm1, %xmm2 # xmm2 = (xmm1 * xmm2) + mem
        vmovsd  %xmm4, (%rdi)
        vfmadd231sd     %xmm3, %xmm0, %xmm2 # xmm2 = (xmm0 * xmm3) + xmm2
        vmovsd  %xmm2, 8(%rdi)
        retq
        nopl    (%rax,%rax)

For a motivating reason, I've been working on an algorithm where fma on real inputs is necessary for maintaining as much numerical accuracy as possible, but the mathematical function is technically defined over the complex domain as well. Without appropriate definitions of fma for Complex, the algorithm currently fails for complex arguments (or requires an extra type-based branch to fall back to muladd, which would give divergent answers when the complex input is located on the real axis).

yuyichao · 2020-06-03T22:42:45Z

Is fma well defined for complex number? Unlike muladd, fma should be well defined and shouldn't just be defined as using as many fusion as seems convinient.

Is there a standard on this? If there is we should follow that.
From the "definition". fma should have a single rounding to the final result (one for real and one for imaginary should be fine) but it doesn't seems like the proposed implementation has this property.

simonbyrne · 2020-06-03T22:59:11Z

Is there a standard on this? If there is we should follow that.

I don't think so. The IEEE spec doesn't discuss complex numbers, and the C spec doesn't define an FMA for them.

From the "definition". fma should have a single rounding to the final result (one for real and one for imaginary should be fine) but it doesn't seems like the proposed implementation has this property.

Agreed: "fusing" is a necessary property of fma, and this doesn't do that.

jmert · 2020-06-03T23:13:48Z

OK, I was concerned that that might be the response, but I thought I'd give it a try.

muladd does effectively reduce to the fma operations I want on modern LLVM (when FMA is available as a architecture instruction), so I guess my generic way forward is to define a private fma/muladd function for real and complex arguments, respectively.

simonbyrne · 2020-06-03T23:32:56Z

Doesn’t the complex muladd already do that?

We should change the TODOs now that the mulsub issue is resolved: did you want to update the PR to do that?

jmert · 2020-06-03T23:38:45Z

Doesn’t the complex muladd already do that?

Yes — what I meant was that for the real case, I want to force fma even if it's not the efficient version because the accuracy is required, but then I need an appropriate fallback path for Complex arguments. I can just multiple-dispatch my way to that with a private function.

We should change the TODOs now that the mulsub issue is resolved: did you want to update the PR to do that?

Sure, I can do that.

Modern LLVM has sufficient optimizations to translate muladds with multiple negations into the appropriate fused-multiple-subtract instructions on x86_64.

jmert · 2020-06-04T00:03:41Z

OK, I've replaced the original commits with a new one which simply completes the muladd chain with further negated muladds. On (at least my) x86_64 machine the complex-complex-complex case goes from a single FMA with additional packed vector instructions to 4 FMAs and a multiply. I don't have access to any arch64 processors, though I think it looks OK according to Godbolt (though I admittedly don't know much about arch processors).

simonbyrne · 2020-06-04T20:53:39Z

Thanks!

…#36140) Modern LLVM has sufficient optimizations to translate muladds with multiple negations into the appropriate fused-multiple-subtract instructions on x86_64.

Satisfy Complex muladd TODO for mulsub with negated muladd

c164b99

Modern LLVM has sufficient optimizations to translate muladds with multiple negations into the appropriate fused-multiple-subtract instructions on x86_64.

jmert force-pushed the complex_fma branch from 11e5675 to c164b99 Compare June 3, 2020 23:51

jmert changed the title ~~Define fma() for Complex arguments~~ Satisfy TODO in Complex muladd Jun 3, 2020

simonbyrne approved these changes Jun 4, 2020

View reviewed changes

simonbyrne merged commit b5868b9 into JuliaLang:master Jun 4, 2020

jmert deleted the complex_fma branch June 4, 2020 22:52

fre-hu mentioned this pull request Sep 1, 2025

Enable fused multiply-subtract in complex mul_add rust-num/num-complex#148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Satisfy TODO in Complex muladd #36140

Satisfy TODO in Complex muladd #36140

Uh oh!

jmert commented Jun 3, 2020 •

edited

Loading

Uh oh!

yuyichao commented Jun 3, 2020

Uh oh!

simonbyrne commented Jun 3, 2020

Uh oh!

jmert commented Jun 3, 2020

Uh oh!

simonbyrne commented Jun 3, 2020

Uh oh!

jmert commented Jun 3, 2020 •

edited

Loading

Uh oh!

jmert commented Jun 4, 2020

Uh oh!

simonbyrne commented Jun 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Satisfy TODO in Complex muladd #36140

Satisfy TODO in Complex muladd #36140

Uh oh!

Conversation

jmert commented Jun 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuyichao commented Jun 3, 2020

Uh oh!

simonbyrne commented Jun 3, 2020

Uh oh!

jmert commented Jun 3, 2020

Uh oh!

simonbyrne commented Jun 3, 2020

Uh oh!

jmert commented Jun 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmert commented Jun 4, 2020

Uh oh!

simonbyrne commented Jun 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jmert commented Jun 3, 2020 •

edited

Loading

jmert commented Jun 3, 2020 •

edited

Loading