Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<complex>: avoid unnecessary conversion of real arguments to complex #2855

Merged
merged 3 commits into from
Jul 14, 2022

Conversation

MattStephanson
Copy link
Contributor

Mixed real/complex multiplication and addition are commutative (and I don't think there are any observable side effects), so it's unnecessary to strictly follow the Returns clauses in [complex.ops]/2 and [complex.ops]/5. Converting a real parameter on the LHS to complex results in a full complex/complex operation that the optimizer is unable to clean up.

https://godbolt.org/z/KW3sWE6YY

Multiplication codegen
std::complex<double> Mult(double const &,std::complex<double> const &) PROC   ; Mult, COMDAT
        movups  xmm0, XMMWORD PTR [r8]
        mov     rax, rcx
        movsd   xmm1, QWORD PTR [rdx]
        unpcklpd xmm1, xmm1
        mulpd   xmm1, xmm0
        movups  XMMWORD PTR [rcx], xmm1
        ret     0
std::complex<double> Mult(double const &,std::complex<double> const &) ENDP   ; Mult

std::complex<double> std::operator*<double>(double const &,std::complex<double> const &) PROC       ; std::operator*<double>, COMDAT
        movsd   xmm5, QWORD PTR [rdx]
        xorps   xmm2, xmm2
        movsd   xmm4, QWORD PTR [r8+8]
        movaps  xmm1, xmm5
        movsd   xmm0, QWORD PTR [r8]
        mov     rax, rcx
        mulsd   xmm5, QWORD PTR [r8]
        mulsd   xmm1, xmm4
        mulsd   xmm4, xmm2
        mulsd   xmm0, xmm2
        subsd   xmm5, xmm4
        addsd   xmm1, xmm0
        movsd   QWORD PTR [rcx], xmm5
        movsd   QWORD PTR [rcx+8], xmm1
        ret     0
std::complex<double> std::operator*<double>(double const &,std::complex<double> const &) ENDP       ; std::operator*<double>
Addition codegen
std::complex<double> Add(double const &,std::complex<double> const &) PROC      ; Add, COMDAT
        movups  xmm0, XMMWORD PTR [r8]
        mov     rax, rcx
        movups  XMMWORD PTR [rcx], xmm0
        addsd   xmm0, QWORD PTR [rdx]
        movsd   QWORD PTR [rcx], xmm0
        ret     0
std::complex<double> Add(double const &,std::complex<double> const &) ENDP      ; Add

std::complex<double> std::operator+<double>(double const &,std::complex<double> const &) PROC       ; std::operator+<double>, COMDAT
        movsd   xmm0, QWORD PTR [rdx]
        mov     rax, rcx
        addsd   xmm0, QWORD PTR [r8]
        movsd   xmm1, QWORD PTR [r8+8]
        movsd   QWORD PTR [rcx], xmm0
        xorps   xmm0, xmm0
        addsd   xmm1, xmm0
        movsd   QWORD PTR [rcx+8], xmm1
        ret     0
std::complex<double> std::operator+<double>(double const &,std::complex<double> const &) ENDP       ; std::operator+<double>

…unnecessary conversion of the real argument to complex.
@MattStephanson MattStephanson requested a review from a team as a code owner July 11, 2022 06:15
Copy link
Contributor

@strega-nil-ms strega-nil-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also compared the assembler output to the operator+(const complex<_Ty>&, const double&) implementation above (on line 1402) - the optimized assembler is exactly the same for both, and for debug the new version is slightly better (one function call instead of two). Would it be possible for you to modify those versions to use the new style of code? (not something that's necessary, hence the approval 😄 )

@MattStephanson
Copy link
Contributor Author

I also compared the assembler output to the operator+(const complex<_Ty>&, const double&) implementation above (on line 1402) - the optimized assembler is exactly the same for both, and for debug the new version is slightly better (one function call instead of two). Would it be possible for you to modify those versions to use the new style of code? (not something that's necessary, hence the approval 😄 )

Will do. I'll also apply it to operator*(complex<T>, T) since it will also give fewer function calls in debug, and tends to produce mulpd instead of two mulsd in release.

@StephanTLavavej StephanTLavavej added the performance Must go faster label Jul 11, 2022
This transformation doesn't assume commutativity; operand order
is preserved. Like the changes for `complex + T` and `complex * T`,
this replaces 2 and 4 function calls with 1.
@StephanTLavavej
Copy link
Member

I've pushed a commit to complex - T and complex / T, following the changes you've made to complex + T and complex * T. (This transformation doesn't assume commutativity.) In release mode, I observe no codegen change for op-, and a codegen improvement for op/:

Before:

	movups	xmm0, XMMWORD PTR [rdx]
	mov	rax, rcx
	movups	XMMWORD PTR [rcx], xmm0
	divsd	xmm0, xmm2
	movsd	QWORD PTR [rcx], xmm0
	movsd	xmm0, QWORD PTR [rcx+8]
	divsd	xmm0, xmm2
	movsd	QWORD PTR [rcx+8], xmm0
	ret	0

After:

	movups	xmm1, XMMWORD PTR [rdx]
	mov	rax, rcx
	movaps	xmm0, xmm2
	unpcklpd xmm0, xmm0
	divpd	xmm1, xmm0
	movups	XMMWORD PTR [rcx], xmm1
	ret	0

Test program:

#include <complex>
using namespace std;

complex<double> meow_sub(const complex<double>& l, double r) {
    return l - r;
}

complex<double> meow_div(const complex<double>& l, double r) {
    return l / r;
}

@StephanTLavavej StephanTLavavej self-assigned this Jul 13, 2022
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit ef62d3f into microsoft:main Jul 14, 2022
@StephanTLavavej
Copy link
Member

Thanks for noticing and improving this codegen! 🚀 ✅ 🎉

@MattStephanson MattStephanson deleted the improve_cplx_ops branch July 20, 2022 04:28
fsb4000 pushed a commit to fsb4000/STL that referenced this pull request Aug 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants