Skip to content

#pragma float_control(precise, on) doesn't work for SSE intrinsics #55713

Open
@obfuscated

Description

@obfuscated

This is the link to godbolt with the full reproducer: https://godbolt.org/z/qYczcba39

The problem is that the pragma doesn't switch the mode when using intrinsics directly, but works when using the operators for the __m128 types.

I've originally discovered this in clang 14.0.1.

The code to see the problem is this (compiled with -Ofast -msse4.2 -mrecip=none):

__m128 func(__m128 d, float oldLen, float newLen) {
	#pragma float_control(precise, on)
	return _mm_div_ps(
		_mm_mul_ps(d, _mm_set1_ps(oldLen)),
		_mm_set1_ps(newLen)
	);
}

__m128 func1(__m128 d, float oldLen, float newLen) {
	#pragma float_control(precise, on)
	return d*oldLen/newLen;
}

And it leads to this assembly:

.LCPI1_0:
        .long   0x3f800000                      # float 1
func(float __vector(4), float, float):                         # @func(float __vector(4), float, float)
        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]
        mulps   xmm0, xmm1
        movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero
        divss   xmm1, xmm2
        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]
        mulps   xmm0, xmm1
        ret
func1(float __vector(4), float, float):                        # @func1(float __vector(4), float, float)
        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]
        mulps   xmm0, xmm1
        shufps  xmm2, xmm2, 0                   # xmm2 = xmm2[0,0,0,0]
        divps   xmm0, xmm2
        ret

Generally the use of *(1/a) optimization here seems questionable and cland doesn't do it for scalars, only for vector/simd types. Is this another bug that needs to be reported separately?

Metadata

Metadata

Assignees

No one assigned

    Labels

    clang:headersHeaders provided by Clang, e.g. for intrinsics

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions