Open
Description
This is the link to godbolt with the full reproducer: https://godbolt.org/z/qYczcba39
The problem is that the pragma doesn't switch the mode when using intrinsics directly, but works when using the operators for the __m128
types.
I've originally discovered this in clang 14.0.1.
The code to see the problem is this (compiled with -Ofast -msse4.2 -mrecip=none
):
__m128 func(__m128 d, float oldLen, float newLen) {
#pragma float_control(precise, on)
return _mm_div_ps(
_mm_mul_ps(d, _mm_set1_ps(oldLen)),
_mm_set1_ps(newLen)
);
}
__m128 func1(__m128 d, float oldLen, float newLen) {
#pragma float_control(precise, on)
return d*oldLen/newLen;
}
And it leads to this assembly:
.LCPI1_0:
.long 0x3f800000 # float 1
func(float __vector(4), float, float): # @func(float __vector(4), float, float)
shufps xmm1, xmm1, 0 # xmm1 = xmm1[0,0,0,0]
mulps xmm0, xmm1
movss xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero
divss xmm1, xmm2
shufps xmm1, xmm1, 0 # xmm1 = xmm1[0,0,0,0]
mulps xmm0, xmm1
ret
func1(float __vector(4), float, float): # @func1(float __vector(4), float, float)
shufps xmm1, xmm1, 0 # xmm1 = xmm1[0,0,0,0]
mulps xmm0, xmm1
shufps xmm2, xmm2, 0 # xmm2 = xmm2[0,0,0,0]
divps xmm0, xmm2
ret
Generally the use of *(1/a)
optimization here seems questionable and cland doesn't do it for scalars, only for vector/simd types. Is this another bug that needs to be reported separately?