Skip to content

Complex division is not optimised with -ffast-math #31220

Closed
@lesshaste

Description

@lesshaste
Bugzilla Link 31872
Version trunk
OS Linux
CC @hyp,@compnerd,@lesshaste,@hfinkel,@joker-eph,@RKSimon,@rotateright,@TNorthover

Extended Description

Consider:

#include <complex.h>
complex float f(complex float x, complex float y) {
return x/y;
}

clang trunk with -O3 -march=core-avx2 but with or without -ffast-math gives:

f: # @​f
vmovaps xmm2, xmm1
vmovshdup xmm1, xmm0 # xmm1 = xmm0[1,1,3,3]
vmovshdup xmm3, xmm2 # xmm3 = xmm2[1,1,3,3]
jmp __divsc3 # TAILCALL

However both gcc and ICC attempt to optimise this code when -ffast-math (or equivalent) is enabled.

ICC appears to give the fastest code which is:

f:
vcvtps2pd xmm2, xmm1 #​3.12
vcvtps2pd xmm4, xmm0 #​3.12
vmulpd xmm8, xmm2, xmm2 #​3.12
vunpckhpd xmm3, xmm2, xmm2 #​3.12
vmulpd xmm6, xmm3, xmm4 #​3.12
vmovddup xmm7, xmm2 #​3.12
vshufpd xmm5, xmm4, xmm4, 1 #​3.12
vshufpd xmm9, xmm8, xmm8, 1 #​3.12
vfmaddsub213pd xmm7, xmm5, xmm6 #​3.12
vaddpd xmm11, xmm8, xmm9 #​3.12
vshufpd xmm10, xmm7, xmm7, 1 #​3.12
vdivpd xmm12, xmm10, xmm11 #​3.12
vcvtpd2ps xmm0, xmm12 #​3.12
ret

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions