`<complex>`: avoid unnecessary conversion of real arguments to complex #2855

MattStephanson · 2022-07-11T06:15:04Z

Mixed real/complex multiplication and addition are commutative (and I don't think there are any observable side effects), so it's unnecessary to strictly follow the Returns clauses in [complex.ops]/2 and [complex.ops]/5. Converting a real parameter on the LHS to complex results in a full complex/complex operation that the optimizer is unable to clean up.

https://godbolt.org/z/KW3sWE6YY

Multiplication codegen

std::complex<double> Mult(double const &,std::complex<double> const &) PROC   ; Mult, COMDAT
        movups  xmm0, XMMWORD PTR [r8]
        mov     rax, rcx
        movsd   xmm1, QWORD PTR [rdx]
        unpcklpd xmm1, xmm1
        mulpd   xmm1, xmm0
        movups  XMMWORD PTR [rcx], xmm1
        ret     0
std::complex<double> Mult(double const &,std::complex<double> const &) ENDP   ; Mult

std::complex<double> std::operator*<double>(double const &,std::complex<double> const &) PROC       ; std::operator*<double>, COMDAT
        movsd   xmm5, QWORD PTR [rdx]
        xorps   xmm2, xmm2
        movsd   xmm4, QWORD PTR [r8+8]
        movaps  xmm1, xmm5
        movsd   xmm0, QWORD PTR [r8]
        mov     rax, rcx
        mulsd   xmm5, QWORD PTR [r8]
        mulsd   xmm1, xmm4
        mulsd   xmm4, xmm2
        mulsd   xmm0, xmm2
        subsd   xmm5, xmm4
        addsd   xmm1, xmm0
        movsd   QWORD PTR [rcx], xmm5
        movsd   QWORD PTR [rcx+8], xmm1
        ret     0
std::complex<double> std::operator*<double>(double const &,std::complex<double> const &) ENDP       ; std::operator*<double>

Addition codegen

std::complex<double> Add(double const &,std::complex<double> const &) PROC      ; Add, COMDAT
        movups  xmm0, XMMWORD PTR [r8]
        mov     rax, rcx
        movups  XMMWORD PTR [rcx], xmm0
        addsd   xmm0, QWORD PTR [rdx]
        movsd   QWORD PTR [rcx], xmm0
        ret     0
std::complex<double> Add(double const &,std::complex<double> const &) ENDP      ; Add

std::complex<double> std::operator+<double>(double const &,std::complex<double> const &) PROC       ; std::operator+<double>, COMDAT
        movsd   xmm0, QWORD PTR [rdx]
        mov     rax, rcx
        addsd   xmm0, QWORD PTR [r8]
        movsd   xmm1, QWORD PTR [r8+8]
        movsd   QWORD PTR [rcx], xmm0
        xorps   xmm0, xmm0
        addsd   xmm1, xmm0
        movsd   QWORD PTR [rcx+8], xmm1
        ret     0
std::complex<double> std::operator+<double>(double const &,std::complex<double> const &) ENDP       ; std::operator+<double>

…unnecessary conversion of the real argument to complex.

strega-nil-ms

I also compared the assembler output to the operator+(const complex<_Ty>&, const double&) implementation above (on line 1402) - the optimized assembler is exactly the same for both, and for debug the new version is slightly better (one function call instead of two). Would it be possible for you to modify those versions to use the new style of code? (not something that's necessary, hence the approval 😄 )

MattStephanson · 2022-07-11T18:27:52Z

I also compared the assembler output to the operator+(const complex<_Ty>&, const double&) implementation above (on line 1402) - the optimized assembler is exactly the same for both, and for debug the new version is slightly better (one function call instead of two). Would it be possible for you to modify those versions to use the new style of code? (not something that's necessary, hence the approval 😄 )

Will do. I'll also apply it to operator*(complex<T>, T) since it will also give fewer function calls in debug, and tends to produce mulpd instead of two mulsd in release.

This transformation doesn't assume commutativity; operand order is preserved. Like the changes for `complex + T` and `complex * T`, this replaces 2 and 4 function calls with 1.

StephanTLavavej · 2022-07-12T02:02:21Z

I've pushed a commit to complex - T and complex / T, following the changes you've made to complex + T and complex * T. (This transformation doesn't assume commutativity.) In release mode, I observe no codegen change for op-, and a codegen improvement for op/:

Before:

	movups	xmm0, XMMWORD PTR [rdx]
	mov	rax, rcx
	movups	XMMWORD PTR [rcx], xmm0
	divsd	xmm0, xmm2
	movsd	QWORD PTR [rcx], xmm0
	movsd	xmm0, QWORD PTR [rcx+8]
	divsd	xmm0, xmm2
	movsd	QWORD PTR [rcx+8], xmm0
	ret	0

After:

	movups	xmm1, XMMWORD PTR [rdx]
	mov	rax, rcx
	movaps	xmm0, xmm2
	unpcklpd xmm0, xmm0
	divpd	xmm1, xmm0
	movups	XMMWORD PTR [rcx], xmm1
	ret	0

Test program:

#include <complex>
using namespace std;

complex<double> meow_sub(const complex<double>& l, double r) {
    return l - r;
}

complex<double> meow_div(const complex<double>& l, double r) {
    return l / r;
}

StephanTLavavej · 2022-07-13T20:22:58Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej · 2022-07-14T22:40:48Z

Thanks for noticing and improving this codegen! 🚀 ✅ 🎉

microsoft#2855) Co-authored-by: Stephan T. Lavavej <stl@nuwen.net>

Swap arguments in commutative mixed complex/real operations to avoid …

c55dad6

…unnecessary conversion of the real argument to complex.

MattStephanson requested a review from a team as a code owner July 11, 2022 06:15

strega-nil-ms approved these changes Jul 11, 2022

View reviewed changes

review feedback

b3c2dbc

StephanTLavavej added the performance Must go faster label Jul 11, 2022

Call op= for complex - T and complex / T.

89f9054

This transformation doesn't assume commutativity; operand order is preserved. Like the changes for `complex + T` and `complex * T`, this replaces 2 and 4 function calls with 1.

StephanTLavavej approved these changes Jul 12, 2022

View reviewed changes

StephanTLavavej self-assigned this Jul 13, 2022

StephanTLavavej merged commit ef62d3f into microsoft:main Jul 14, 2022

MattStephanson deleted the improve_cplx_ops branch July 20, 2022 04:28

StephanTLavavej mentioned this pull request Jul 25, 2022

ASAN tests are sporadically failing #2908

Closed

fsb4000 pushed a commit to fsb4000/STL that referenced this pull request Aug 13, 2022

<complex>: avoid unnecessary conversion of real arguments to complex (

3e293ba

microsoft#2855) Co-authored-by: Stephan T. Lavavej <stl@nuwen.net>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`<complex>`: avoid unnecessary conversion of real arguments to complex #2855

`<complex>`: avoid unnecessary conversion of real arguments to complex #2855

MattStephanson commented Jul 11, 2022

strega-nil-ms left a comment •

edited

Loading

MattStephanson commented Jul 11, 2022

StephanTLavavej commented Jul 12, 2022

StephanTLavavej commented Jul 13, 2022

StephanTLavavej commented Jul 14, 2022

<complex>: avoid unnecessary conversion of real arguments to complex #2855

<complex>: avoid unnecessary conversion of real arguments to complex #2855

Conversation

MattStephanson commented Jul 11, 2022

strega-nil-ms left a comment • edited Loading

Choose a reason for hiding this comment

MattStephanson commented Jul 11, 2022

StephanTLavavej commented Jul 12, 2022

StephanTLavavej commented Jul 13, 2022

StephanTLavavej commented Jul 14, 2022

`<complex>`: avoid unnecessary conversion of real arguments to complex #2855

`<complex>`: avoid unnecessary conversion of real arguments to complex #2855

strega-nil-ms left a comment •

edited

Loading