Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] LLVM >= 17 generates call to __extendhfsf2 for bitcast -> and -> bitcast #104914

Closed
overmighty opened this issue Aug 20, 2024 · 4 comments
Closed
Labels

Comments

@overmighty
Copy link
Member

https://godbolt.org/z/exb7PbjnK

C++ code:

_Float16 fabsf16(_Float16 x) {
    return __builtin_bit_cast(
        _Float16,
        static_cast<unsigned short>(
            __builtin_bit_cast(unsigned short, x) & 0x7fff
        )
    );
}

IR:

define dso_local noundef half @fabsf16(_Float16)(half noundef %x) local_unnamed_addr {
entry:
  %0 = bitcast half %x to i16
  %1 = and i16 %0, 32767
  %2 = bitcast i16 %1 to half
  ret half %2
}

Clang 16 output assembly with -O3:

fabsf16(_Float16):
        pextrw  eax, xmm0, 0
        and     eax, 32767
        pinsrw  xmm0, eax, 0
        ret

Clang 17 output assembly with -O3:

.LCPI0_0:
        .long   0x7fffffff
        .long   0x7fffffff
        .long   0x7fffffff
        .long   0x7fffffff
fabsf16(_Float16):
        push    rax
        call    __extendhfsf2@PLT
        andps   xmm0, xmmword ptr [rip + .LCPI0_0]
        call    __truncsfhf2@PLT
        pop     rax
        ret

Related: #104869 (comment).

cc @arsenm @lntue

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 20, 2024

@llvm/issue-subscribers-backend-x86

Author: OverMighty (overmighty)

https://godbolt.org/z/exb7PbjnK

C++ code:

_Float16 fabsf16(_Float16 x) {
    return __builtin_bit_cast(
        _Float16,
        static_cast&lt;unsigned short&gt;(
            __builtin_bit_cast(unsigned short, x) &amp; 0x7fff
        )
    );
}

IR:

define dso_local noundef half @<!-- -->fabsf16(_Float16)(half noundef %x) local_unnamed_addr {
entry:
  %0 = bitcast half %x to i16
  %1 = and i16 %0, 32767
  %2 = bitcast i16 %1 to half
  ret half %2
}

Clang 16 output assembly with -O3:

fabsf16(_Float16):
        pextrw  eax, xmm0, 0
        and     eax, 32767
        pinsrw  xmm0, eax, 0
        ret

Clang 17 output assembly with -O3:

.LCPI0_0:
        .long   0x7fffffff
        .long   0x7fffffff
        .long   0x7fffffff
        .long   0x7fffffff
fabsf16(_Float16):
        push    rax
        call    __extendhfsf2@<!-- -->PLT
        andps   xmm0, xmmword ptr [rip + .LCPI0_0]
        call    __truncsfhf2@<!-- -->PLT
        pop     rax
        ret

Related: #104869 (comment).

cc @arsenm @lntue

@arsenm
Copy link
Contributor

arsenm commented Aug 20, 2024

Duplicate #104915, this version is just peeling out a layer of implementation detail

@arsenm arsenm closed this as not planned Won't fix, can't repro, duplicate, stale Aug 20, 2024
@overmighty
Copy link
Member Author

There are cases where Clang/LLVM 17 does not do this but 18 does: https://godbolt.org/z/3as5jeG46. But maybe LLVM 17 just added a DAG pattern for x86 that's not as reliable as the InstCombine change in LLVM 18, and both issues are caused by FABS ending up being used and FABS for f16 promoting the operand to f32 when AVX-512 FP16 is unavailable.

llvm/lib/Target/X86/X86ISelLowering.cpp has:

  auto setF16Action = [&] (MVT VT, LegalizeAction Action) {
    setOperationAction(ISD::FABS, VT, Action);
    // ...
  };

  // ...

    // Half type will be promoted by default.
    setF16Action(MVT::f16, Promote);

@EugeneZelenko EugeneZelenko added the duplicate Resolved as duplicate label Aug 20, 2024
@arsenm
Copy link
Contributor

arsenm commented Aug 20, 2024

There are cases where Clang/LLVM 17 does not do this but 18 does:

The DAG always did this (only as an optimization), depending on the old target hook hasBitPreservingFPLogic, which was removed in 09dd4d8. The underlying promotion for fabs was always broken, just hidden by the optimization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants