[SDAG] Fix fmaximum legalization errors #142170

nikic · 2025-05-30T15:33:18Z

FMAXIMUM is currently legalized via IS_FPCLASS for the signed zero handling. This is problematic, because it assumes the equivalent integer type is legal. Many targets have legal fp128, but illegal i128, so this results in legalization failures.

Fix this by replacing IS_FPCLASS with checking the bitcast to integer instead. In that case it is sufficient to use any legal integer type, as we're just interested in the sign bit. This can be obtained via a stack temporary cast. There is existing FloatSignAsInt functionality used for legalization of FABS and similar we can use for this purpose.

Fixes #139380.
Fixes #139381.
Fixes #140445.

llvmbot · 2025-05-30T15:33:51Z

@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-backend-powerpc

Author: Nikita Popov (nikic)

Changes

FMAXIMUM is currently legalized via IS_FPCLASS for the signed zero handling. This is problematic, because it assumes the equivalent integer type is legal. Many targets have legal fp128, but illegal i128, so this results in legalization failures.

Fix this by replacing IS_FPCLASS with checking the bitcast to integer instead. In that case it is sufficient to use any legal integer type, as we're just interested in the sign bit. This can be obtained via a stack temporary cast. There is existing FloatSignAsInt functionality used for legalization of FABS and similar we can use for this purpose.

(Happy to land the FloatSignAsInt helper move separately if the general direction here seems ok.)

Patch is 65.59 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142170.diff

11 Files Affected:

(modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+24)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+7-93)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+60)
(modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+10-8)
(modified) llvm/test/CodeGen/AArch64/fmaximum-legalization.ll (+56)
(added) llvm/test/CodeGen/ARM/fp-maximum-legalization.ll (+49)
(modified) llvm/test/CodeGen/NVPTX/bf16-instructions.ll (+11-11)
(modified) llvm/test/CodeGen/NVPTX/math-intrins.ll (+64-64)
(modified) llvm/test/CodeGen/PowerPC/fminimum-fmaximum-f128.ll (+46-50)
(modified) llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll (+149-159)
(modified) llvm/test/CodeGen/X86/fminimum-fmaximum.ll (+177)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index a98e46c587273..7dc70d5c50c03 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -215,6 +215,20 @@ class SDDbgInfo {
 
 LLVM_ABI void checkForCycles(const SelectionDAG *DAG, bool force = false);
 
+/// Keeps track of state when getting the sign of a floating-point value as an
+/// integer.
+struct FloatSignAsInt {
+  EVT FloatVT;
+  SDValue Chain;
+  SDValue FloatPtr;
+  SDValue IntPtr;
+  MachinePointerInfo IntPointerInfo;
+  MachinePointerInfo FloatPointerInfo;
+  SDValue IntValue;
+  APInt SignMask;
+  uint8_t SignBit;
+};
+
 /// This is used to represent a portion of an LLVM function in a low-level
 /// Data Dependence DAG representation suitable for instruction selection.
 /// This DAG is constructed as the first step of instruction selection in order
@@ -2017,6 +2031,16 @@ class SelectionDAG {
   /// value types.
   LLVM_ABI SDValue CreateStackTemporary(EVT VT1, EVT VT2);
 
+  /// Bitcast a floating-point value to an integer value. Only bitcast the part
+  /// containing the sign bit if the target has no integer value capable of
+  /// holding all bits of the floating-point value.
+  void getSignAsIntValue(FloatSignAsInt &State, const SDLoc &DL, SDValue Value);
+
+  /// Replace the integer value produced by getSignAsIntValue() with a new value
+  /// and cast the result back to a floating-point type.
+  SDValue modifySignAsInt(const FloatSignAsInt &State, const SDLoc &DL,
+                          SDValue NewIntValue);
+
   LLVM_ABI SDValue FoldSymbolOffset(unsigned Opcode, EVT VT,
                                     const GlobalAddressSDNode *GA,
                                     const SDNode *N2);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 528c07cc5549d..2c41a871b6d6c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -59,20 +59,6 @@ using namespace llvm;
 
 namespace {
 
-/// Keeps track of state when getting the sign of a floating-point value as an
-/// integer.
-struct FloatSignAsInt {
-  EVT FloatVT;
-  SDValue Chain;
-  SDValue FloatPtr;
-  SDValue IntPtr;
-  MachinePointerInfo IntPointerInfo;
-  MachinePointerInfo FloatPointerInfo;
-  SDValue IntValue;
-  APInt SignMask;
-  uint8_t SignBit;
-};
-
 //===----------------------------------------------------------------------===//
 /// This takes an arbitrary SelectionDAG as input and
 /// hacks on it until the target machine can handle it.  This involves
@@ -166,10 +152,6 @@ class SelectionDAGLegalize {
   SDValue ExpandSCALAR_TO_VECTOR(SDNode *Node);
   void ExpandDYNAMIC_STACKALLOC(SDNode *Node,
                                 SmallVectorImpl<SDValue> &Results);
-  void getSignAsIntValue(FloatSignAsInt &State, const SDLoc &DL,
-                         SDValue Value) const;
-  SDValue modifySignAsInt(const FloatSignAsInt &State, const SDLoc &DL,
-                          SDValue NewIntValue) const;
   SDValue ExpandFCOPYSIGN(SDNode *Node) const;
   SDValue ExpandFABS(SDNode *Node) const;
   SDValue ExpandFNEG(SDNode *Node) const;
@@ -1620,74 +1602,6 @@ SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) {
   return DAG.getLoad(VT, dl, StoreChain, FIPtr, PtrInfo);
 }
 
-/// Bitcast a floating-point value to an integer value. Only bitcast the part
-/// containing the sign bit if the target has no integer value capable of
-/// holding all bits of the floating-point value.
-void SelectionDAGLegalize::getSignAsIntValue(FloatSignAsInt &State,
-                                             const SDLoc &DL,
-                                             SDValue Value) const {
-  EVT FloatVT = Value.getValueType();
-  unsigned NumBits = FloatVT.getScalarSizeInBits();
-  State.FloatVT = FloatVT;
-  EVT IVT = EVT::getIntegerVT(*DAG.getContext(), NumBits);
-  // Convert to an integer of the same size.
-  if (TLI.isTypeLegal(IVT)) {
-    State.IntValue = DAG.getNode(ISD::BITCAST, DL, IVT, Value);
-    State.SignMask = APInt::getSignMask(NumBits);
-    State.SignBit = NumBits - 1;
-    return;
-  }
-
-  auto &DataLayout = DAG.getDataLayout();
-  // Store the float to memory, then load the sign part out as an integer.
-  MVT LoadTy = TLI.getRegisterType(MVT::i8);
-  // First create a temporary that is aligned for both the load and store.
-  SDValue StackPtr = DAG.CreateStackTemporary(FloatVT, LoadTy);
-  int FI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
-  // Then store the float to it.
-  State.FloatPtr = StackPtr;
-  MachineFunction &MF = DAG.getMachineFunction();
-  State.FloatPointerInfo = MachinePointerInfo::getFixedStack(MF, FI);
-  State.Chain = DAG.getStore(DAG.getEntryNode(), DL, Value, State.FloatPtr,
-                             State.FloatPointerInfo);
-
-  SDValue IntPtr;
-  if (DataLayout.isBigEndian()) {
-    assert(FloatVT.isByteSized() && "Unsupported floating point type!");
-    // Load out a legal integer with the same sign bit as the float.
-    IntPtr = StackPtr;
-    State.IntPointerInfo = State.FloatPointerInfo;
-  } else {
-    // Advance the pointer so that the loaded byte will contain the sign bit.
-    unsigned ByteOffset = (NumBits / 8) - 1;
-    IntPtr =
-        DAG.getMemBasePlusOffset(StackPtr, TypeSize::getFixed(ByteOffset), DL);
-    State.IntPointerInfo = MachinePointerInfo::getFixedStack(MF, FI,
-                                                             ByteOffset);
-  }
-
-  State.IntPtr = IntPtr;
-  State.IntValue = DAG.getExtLoad(ISD::EXTLOAD, DL, LoadTy, State.Chain, IntPtr,
-                                  State.IntPointerInfo, MVT::i8);
-  State.SignMask = APInt::getOneBitSet(LoadTy.getScalarSizeInBits(), 7);
-  State.SignBit = 7;
-}
-
-/// Replace the integer value produced by getSignAsIntValue() with a new value
-/// and cast the result back to a floating-point type.
-SDValue SelectionDAGLegalize::modifySignAsInt(const FloatSignAsInt &State,
-                                              const SDLoc &DL,
-                                              SDValue NewIntValue) const {
-  if (!State.Chain)
-    return DAG.getNode(ISD::BITCAST, DL, State.FloatVT, NewIntValue);
-
-  // Override the part containing the sign bit in the value stored on the stack.
-  SDValue Chain = DAG.getTruncStore(State.Chain, DL, NewIntValue, State.IntPtr,
-                                    State.IntPointerInfo, MVT::i8);
-  return DAG.getLoad(State.FloatVT, DL, Chain, State.FloatPtr,
-                     State.FloatPointerInfo);
-}
-
 SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
   SDLoc DL(Node);
   SDValue Mag = Node->getOperand(0);
@@ -1695,7 +1609,7 @@ SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
 
   // Get sign bit into an integer value.
   FloatSignAsInt SignAsInt;
-  getSignAsIntValue(SignAsInt, DL, Sign);
+  DAG.getSignAsIntValue(SignAsInt, DL, Sign);
 
   EVT IntVT = SignAsInt.IntValue.getValueType();
   SDValue SignMask = DAG.getConstant(SignAsInt.SignMask, DL, IntVT);
@@ -1716,7 +1630,7 @@ SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
 
   // Transform Mag value to integer, and clear the sign bit.
   FloatSignAsInt MagAsInt;
-  getSignAsIntValue(MagAsInt, DL, Mag);
+  DAG.getSignAsIntValue(MagAsInt, DL, Mag);
   EVT MagVT = MagAsInt.IntValue.getValueType();
   SDValue ClearSignMask = DAG.getConstant(~MagAsInt.SignMask, DL, MagVT);
   SDValue ClearedSign = DAG.getNode(ISD::AND, DL, MagVT, MagAsInt.IntValue,
@@ -1746,14 +1660,14 @@ SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
   SDValue CopiedSign = DAG.getNode(ISD::OR, DL, MagVT, ClearedSign, SignBit,
                                    SDNodeFlags::Disjoint);
 
-  return modifySignAsInt(MagAsInt, DL, CopiedSign);
+  return DAG.modifySignAsInt(MagAsInt, DL, CopiedSign);
 }
 
 SDValue SelectionDAGLegalize::ExpandFNEG(SDNode *Node) const {
   // Get the sign bit as an integer.
   SDLoc DL(Node);
   FloatSignAsInt SignAsInt;
-  getSignAsIntValue(SignAsInt, DL, Node->getOperand(0));
+  DAG.getSignAsIntValue(SignAsInt, DL, Node->getOperand(0));
   EVT IntVT = SignAsInt.IntValue.getValueType();
 
   // Flip the sign.
@@ -1762,7 +1676,7 @@ SDValue SelectionDAGLegalize::ExpandFNEG(SDNode *Node) const {
       DAG.getNode(ISD::XOR, DL, IntVT, SignAsInt.IntValue, SignMask);
 
   // Convert back to float.
-  return modifySignAsInt(SignAsInt, DL, SignFlip);
+  return DAG.modifySignAsInt(SignAsInt, DL, SignFlip);
 }
 
 SDValue SelectionDAGLegalize::ExpandFABS(SDNode *Node) const {
@@ -1778,12 +1692,12 @@ SDValue SelectionDAGLegalize::ExpandFABS(SDNode *Node) const {
 
   // Transform value to integer, clear the sign bit and transform back.
   FloatSignAsInt ValueAsInt;
-  getSignAsIntValue(ValueAsInt, DL, Value);
+  DAG.getSignAsIntValue(ValueAsInt, DL, Value);
   EVT IntVT = ValueAsInt.IntValue.getValueType();
   SDValue ClearSignMask = DAG.getConstant(~ValueAsInt.SignMask, DL, IntVT);
   SDValue ClearedSign = DAG.getNode(ISD::AND, DL, IntVT, ValueAsInt.IntValue,
                                     ClearSignMask);
-  return modifySignAsInt(ValueAsInt, DL, ClearedSign);
+  return DAG.modifySignAsInt(ValueAsInt, DL, ClearedSign);
 }
 
 void SelectionDAGLegalize::ExpandDYNAMIC_STACKALLOC(SDNode* Node,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 1506bc4ee187d..023d53f24ed19 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -2760,6 +2760,66 @@ SDValue SelectionDAG::CreateStackTemporary(EVT VT1, EVT VT2) {
   return CreateStackTemporary(Bytes, Align);
 }
 
+void SelectionDAG::getSignAsIntValue(FloatSignAsInt &State, const SDLoc &DL,
+                                     SDValue Value) {
+  EVT FloatVT = Value.getValueType();
+  unsigned NumBits = FloatVT.getScalarSizeInBits();
+  State.FloatVT = FloatVT;
+  EVT IVT = FloatVT.changeTypeToInteger();
+  // Convert to an integer of the same size.
+  if (TLI->isTypeLegal(IVT)) {
+    State.IntValue = getNode(ISD::BITCAST, DL, IVT, Value);
+    State.SignMask = APInt::getSignMask(NumBits);
+    State.SignBit = NumBits - 1;
+    return;
+  }
+
+  auto &DataLayout = getDataLayout();
+  // Store the float to memory, then load the sign part out as an integer.
+  MVT LoadTy = TLI->getRegisterType(MVT::i8);
+  // First create a temporary that is aligned for both the load and store.
+  SDValue StackPtr = CreateStackTemporary(FloatVT, LoadTy);
+  int FI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
+  // Then store the float to it.
+  State.FloatPtr = StackPtr;
+  MachineFunction &MF = getMachineFunction();
+  State.FloatPointerInfo = MachinePointerInfo::getFixedStack(MF, FI);
+  State.Chain = getStore(getEntryNode(), DL, Value, State.FloatPtr,
+                         State.FloatPointerInfo);
+
+  SDValue IntPtr;
+  if (DataLayout.isBigEndian()) {
+    assert(FloatVT.isByteSized() && "Unsupported floating point type!");
+    // Load out a legal integer with the same sign bit as the float.
+    IntPtr = StackPtr;
+    State.IntPointerInfo = State.FloatPointerInfo;
+  } else {
+    // Advance the pointer so that the loaded byte will contain the sign bit.
+    unsigned ByteOffset = (NumBits / 8) - 1;
+    IntPtr = getMemBasePlusOffset(StackPtr, TypeSize::getFixed(ByteOffset), DL);
+    State.IntPointerInfo =
+        MachinePointerInfo::getFixedStack(MF, FI, ByteOffset);
+  }
+
+  State.IntPtr = IntPtr;
+  State.IntValue = getExtLoad(ISD::EXTLOAD, DL, LoadTy, State.Chain, IntPtr,
+                              State.IntPointerInfo, MVT::i8);
+  State.SignMask = APInt::getOneBitSet(LoadTy.getScalarSizeInBits(), 7);
+  State.SignBit = 7;
+}
+
+SDValue SelectionDAG::modifySignAsInt(const FloatSignAsInt &State,
+                                      const SDLoc &DL, SDValue NewIntValue) {
+  if (!State.Chain)
+    return getNode(ISD::BITCAST, DL, State.FloatVT, NewIntValue);
+
+  // Override the part containing the sign bit in the value stored on the stack.
+  SDValue Chain = getTruncStore(State.Chain, DL, NewIntValue, State.IntPtr,
+                                State.IntPointerInfo, MVT::i8);
+  return getLoad(State.FloatVT, DL, Chain, State.FloatPtr,
+                 State.FloatPointerInfo);
+}
+
 SDValue SelectionDAG::FoldSetCC(EVT VT, SDValue N1, SDValue N2,
                                 ISD::CondCode Cond, const SDLoc &dl) {
   EVT OpVT = N1.getValueType();
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 22d0bc9914585..13315fed7ed2a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -8610,16 +8610,18 @@ SDValue TargetLowering::expandFMINIMUM_FMAXIMUM(SDNode *N,
   // fminimum/fmaximum requires -0.0 less than +0.0
   if (!MinMaxMustRespectOrderedZero && !N->getFlags().hasNoSignedZeros() &&
       !DAG.isKnownNeverZeroFloat(RHS) && !DAG.isKnownNeverZeroFloat(LHS)) {
+    auto IsSpecificZero = [&](SDValue F) {
+      FloatSignAsInt State;
+      DAG.getSignAsIntValue(State, DL, F);
+      return DAG.getSetCC(DL, CCVT, State.IntValue,
+                          DAG.getConstant(0, DL, State.IntValue.getValueType()),
+                          IsMax ? ISD::SETEQ : ISD::SETNE);
+    };
     SDValue IsZero = DAG.getSetCC(DL, CCVT, MinMax,
                                   DAG.getConstantFP(0.0, DL, VT), ISD::SETOEQ);
-    SDValue TestZero =
-        DAG.getTargetConstant(IsMax ? fcPosZero : fcNegZero, DL, MVT::i32);
-    SDValue LCmp = DAG.getSelect(
-        DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, LHS, TestZero), LHS,
-        MinMax, Flags);
-    SDValue RCmp = DAG.getSelect(
-        DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, RHS, TestZero), RHS,
-        LCmp, Flags);
+    SDValue LCmp =
+        DAG.getSelect(DL, VT, IsSpecificZero(LHS), LHS, MinMax, Flags);
+    SDValue RCmp = DAG.getSelect(DL, VT, IsSpecificZero(RHS), RHS, LCmp, Flags);
     MinMax = DAG.getSelect(DL, VT, IsZero, RCmp, MinMax, Flags);
   }
 
diff --git a/llvm/test/CodeGen/AArch64/fmaximum-legalization.ll b/llvm/test/CodeGen/AArch64/fmaximum-legalization.ll
index 86c1474068482..9f542abcb80f7 100644
--- a/llvm/test/CodeGen/AArch64/fmaximum-legalization.ll
+++ b/llvm/test/CodeGen/AArch64/fmaximum-legalization.ll
@@ -41,3 +41,59 @@ define <4 x half> @fmaximum_v4f16(<4 x half> %x, <4 x half> %y) {
   %r = call <4 x half> @llvm.maximum.v4f16(<4 x half> %x, <4 x half> %y)
   ret <4 x half> %r
 }
+
+define fp128 @maximum_fp128(fp128 %x, fp128 %y) nounwind {
+; CHECK-LABEL: maximum_fp128:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sub sp, sp, #96
+; CHECK-NEXT:    str x30, [sp, #80] // 8-byte Folded Spill
+; CHECK-NEXT:    stp q0, q1, [sp] // 32-byte Folded Spill
+; CHECK-NEXT:    stp q1, q0, [sp, #48]
+; CHECK-NEXT:    bl __gttf2
+; CHECK-NEXT:    ldp q0, q1, [sp] // 32-byte Folded Reload
+; CHECK-NEXT:    cmp w0, #0
+; CHECK-NEXT:    b.le .LBB1_2
+; CHECK-NEXT:  // %bb.1:
+; CHECK-NEXT:    mov v1.16b, v0.16b
+; CHECK-NEXT:  .LBB1_2:
+; CHECK-NEXT:    str q1, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    ldr q1, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    bl __unordtf2
+; CHECK-NEXT:    ldr q0, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    cmp w0, #0
+; CHECK-NEXT:    b.eq .LBB1_4
+; CHECK-NEXT:  // %bb.3:
+; CHECK-NEXT:    adrp x8, .LCPI1_0
+; CHECK-NEXT:    ldr q0, [x8, :lo12:.LCPI1_0]
+; CHECK-NEXT:  .LBB1_4:
+; CHECK-NEXT:    ldrb w8, [sp, #79]
+; CHECK-NEXT:    mov v1.16b, v0.16b
+; CHECK-NEXT:    cmp w8, #0
+; CHECK-NEXT:    b.ne .LBB1_6
+; CHECK-NEXT:  // %bb.5:
+; CHECK-NEXT:    ldr q1, [sp] // 16-byte Folded Reload
+; CHECK-NEXT:  .LBB1_6:
+; CHECK-NEXT:    ldrb w8, [sp, #63]
+; CHECK-NEXT:    cmp w8, #0
+; CHECK-NEXT:    b.ne .LBB1_8
+; CHECK-NEXT:  // %bb.7:
+; CHECK-NEXT:    ldr q1, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:  .LBB1_8:
+; CHECK-NEXT:    adrp x8, .LCPI1_1
+; CHECK-NEXT:    str q0, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    str q1, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    ldr q1, [x8, :lo12:.LCPI1_1]
+; CHECK-NEXT:    ldr q0, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    bl __eqtf2
+; CHECK-NEXT:    ldr q0, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    cmp w0, #0
+; CHECK-NEXT:    b.ne .LBB1_10
+; CHECK-NEXT:  // %bb.9:
+; CHECK-NEXT:    ldr q0, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:  .LBB1_10:
+; CHECK-NEXT:    ldr x30, [sp, #80] // 8-byte Folded Reload
+; CHECK-NEXT:    add sp, sp, #96
+; CHECK-NEXT:    ret
+  %res = call fp128 @llvm.maximum.f128(fp128 %x, fp128 %y)
+  ret fp128 %res
+}
diff --git a/llvm/test/CodeGen/ARM/fp-maximum-legalization.ll b/llvm/test/CodeGen/ARM/fp-maximum-legalization.ll
new file mode 100644
index 0000000000000..a3ab144356e16
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/fp-maximum-legalization.ll
@@ -0,0 +1,49 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=armv7 | FileCheck %s
+
+define double @maximum_double(double %x, double %y) nounwind {
+; CHECK-LABEL: maximum_double:
+; CHECK:       @ %bb.0:
+; CHECK-NEXT:    sub sp, sp, #16
+; CHECK-NEXT:    vmov d17, r2, r3
+; CHECK-NEXT:    mov r2, #0
+; CHECK-NEXT:    vmov d16, r0, r1
+; CHECK-NEXT:    mov r3, #0
+; CHECK-NEXT:    vcmp.f64 d16, d17
+; CHECK-NEXT:    mov r0, #0
+; CHECK-NEXT:    vmrs APSR_nzcv, fpscr
+; CHECK-NEXT:    vstr d16, [sp, #8]
+; CHECK-NEXT:    vstr d17, [sp]
+; CHECK-NEXT:    ldrb r1, [sp, #15]
+; CHECK-NEXT:    vmov.f64 d19, d17
+; CHECK-NEXT:    clz r1, r1
+; CHECK-NEXT:    vldr d18, .LCPI0_0
+; CHECK-NEXT:    movwvs r2, #1
+; CHECK-NEXT:    movwgt r3, #1
+; CHECK-NEXT:    cmp r3, #0
+; CHECK-NEXT:    vmovne.f64 d19, d16
+; CHECK-NEXT:    cmp r2, #0
+; CHECK-NEXT:    ldrb r2, [sp, #7]
+; CHECK-NEXT:    vmovne.f64 d19, d18
+; CHECK-NEXT:    lsrs r1, r1, #5
+; CHECK-NEXT:    clz r1, r2
+; CHECK-NEXT:    vcmp.f64 d19, #0
+; CHECK-NEXT:    vmov.f64 d18, d19
+; CHECK-NEXT:    vmovne.f64 d18, d16
+; CHECK-NEXT:    lsrs r1, r1, #5
+; CHECK-NEXT:    vmovne.f64 d18, d17
+; CHECK-NEXT:    vmrs APSR_nzcv, fpscr
+; CHECK-NEXT:    movweq r0, #1
+; CHECK-NEXT:    cmp r0, #0
+; CHECK-NEXT:    vmovne.f64 d19, d18
+; CHECK-NEXT:    vmov r0, r1, d19
+; CHECK-NEXT:    add sp, sp, #16
+; CHECK-NEXT:    bx lr
+; CHECK-NEXT:    .p2align 3
+; CHECK-NEXT:  @ %bb.1:
+; CHECK-NEXT:  .LCPI0_0:
+; CHECK-NEXT:    .long 0 @ double NaN
+; CHECK-NEXT:    .long 2146959360
+  %res = call double @llvm.maximum(double %x, double %y)
+  ret double %res
+}
diff --git a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
index 32225ed04e2d9..096649e5bde43 100644
--- a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
@@ -1476,7 +1476,7 @@ define <2 x bfloat> @test_maximum_v2(<2 x bfloat> %a, <2 x bfloat> %b) {
 ; SM70-LABEL: test_maximum_v2(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<11>;
-; SM70-NEXT:    .reg .b16 %rs<15>;
+; SM70-NEXT:    .reg .b16 %rs<19>;
 ; SM70-NEXT:    .reg .b32 %r<16>;
 ; SM70-EMPTY:
 ; SM70-NEXT:  // %bb.0:
@@ -1493,30 +1493,30 @@ define <2 x bfloat> @test_maximum_v2(<2 x bfloat> %a, <2 x bfloat> %b) {
 ; SM70-NEXT:    setp.nan.f32 %p2, %r6, %r4;
 ; SM70-NEXT:    selp.b16 %rs6, 0x7FC0, %rs5, %p2;
 ; SM70-NEXT:    setp.eq.s16 %p3, %rs4, 0;
-; SM70-NEXT:    selp.b16 %rs7, %rs4, %rs6, %p3;
+; SM70-NEXT:    selp.b16 %rs9, %rs4, %rs6, %p3;
 ; SM70-NEXT:    setp.eq.s16 %p4, %rs2, 0;
-; SM70-NEXT:    selp.b16 %rs8, %rs2, %rs7, %p4;
+; SM70-NEXT:    selp.b16 %rs12, %rs2, %rs9, %p4;
 ; SM70-NEXT:    cvt.u32.u16 %r7, %rs6;
 ; SM70-NEXT:    shl.b32 %r8, %r7, 16;
 ; SM70-NEXT:    setp.eq.f32 %p5, %r8, 0f00000000;
-; SM70-NEXT:    selp.b16 %rs9, %rs8, %rs6, %p5;
+; SM70-NEXT:    selp.b16 %rs13, %rs12, %rs6, %p5;
 ; SM70-NEXT:    cvt.u32.u16 %r9, %rs1;
 ; SM70-NEXT:    shl.b32 %r10, %r9, 16;
 ; SM70-NEXT:    cvt.u32.u16 %r11, %rs3;
 ; SM70-NEXT:    shl.b32 %r12, %r11, 16;
 ; SM70-NEXT:    setp.gt.f32 %p6, %r12, %r10;
-; SM70-NEXT:    selp.b16 %rs10, %rs3, %rs1, %p6;
+; SM70-NEXT:    selp.b16 %rs14, %rs3, %rs1, %p6;
 ; SM70-NEXT:    setp.nan.f32 %p7, %r12, %r10;
-; SM70-NEXT:    selp.b16 %rs11, 0x7FC0, %rs10, %p7;
+;...
[truncated]

llvmbot · 2025-05-30T15:33:52Z

@llvm/pr-subscribers-backend-arm

Author: Nikita Popov (nikic)

Changes

FMAXIMUM is currently legalized via IS_FPCLASS for the signed zero handling. This is problematic, because it assumes the equivalent integer type is legal. Many targets have legal fp128, but illegal i128, so this results in legalization failures.

Fix this by replacing IS_FPCLASS with checking the bitcast to integer instead. In that case it is sufficient to use any legal integer type, as we're just interested in the sign bit. This can be obtained via a stack temporary cast. There is existing FloatSignAsInt functionality used for legalization of FABS and similar we can use for this purpose.

(Happy to land the FloatSignAsInt helper move separately if the general direction here seems ok.)

Patch is 65.59 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142170.diff

11 Files Affected:

(modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+24)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+7-93)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+60)
(modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+10-8)
(modified) llvm/test/CodeGen/AArch64/fmaximum-legalization.ll (+56)
(added) llvm/test/CodeGen/ARM/fp-maximum-legalization.ll (+49)
(modified) llvm/test/CodeGen/NVPTX/bf16-instructions.ll (+11-11)
(modified) llvm/test/CodeGen/NVPTX/math-intrins.ll (+64-64)
(modified) llvm/test/CodeGen/PowerPC/fminimum-fmaximum-f128.ll (+46-50)
(modified) llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll (+149-159)
(modified) llvm/test/CodeGen/X86/fminimum-fmaximum.ll (+177)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index a98e46c587273..7dc70d5c50c03 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -215,6 +215,20 @@ class SDDbgInfo {
 
 LLVM_ABI void checkForCycles(const SelectionDAG *DAG, bool force = false);
 
+/// Keeps track of state when getting the sign of a floating-point value as an
+/// integer.
+struct FloatSignAsInt {
+  EVT FloatVT;
+  SDValue Chain;
+  SDValue FloatPtr;
+  SDValue IntPtr;
+  MachinePointerInfo IntPointerInfo;
+  MachinePointerInfo FloatPointerInfo;
+  SDValue IntValue;
+  APInt SignMask;
+  uint8_t SignBit;
+};
+
 /// This is used to represent a portion of an LLVM function in a low-level
 /// Data Dependence DAG representation suitable for instruction selection.
 /// This DAG is constructed as the first step of instruction selection in order
@@ -2017,6 +2031,16 @@ class SelectionDAG {
   /// value types.
   LLVM_ABI SDValue CreateStackTemporary(EVT VT1, EVT VT2);
 
+  /// Bitcast a floating-point value to an integer value. Only bitcast the part
+  /// containing the sign bit if the target has no integer value capable of
+  /// holding all bits of the floating-point value.
+  void getSignAsIntValue(FloatSignAsInt &State, const SDLoc &DL, SDValue Value);
+
+  /// Replace the integer value produced by getSignAsIntValue() with a new value
+  /// and cast the result back to a floating-point type.
+  SDValue modifySignAsInt(const FloatSignAsInt &State, const SDLoc &DL,
+                          SDValue NewIntValue);
+
   LLVM_ABI SDValue FoldSymbolOffset(unsigned Opcode, EVT VT,
                                     const GlobalAddressSDNode *GA,
                                     const SDNode *N2);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 528c07cc5549d..2c41a871b6d6c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -59,20 +59,6 @@ using namespace llvm;
 
 namespace {
 
-/// Keeps track of state when getting the sign of a floating-point value as an
-/// integer.
-struct FloatSignAsInt {
-  EVT FloatVT;
-  SDValue Chain;
-  SDValue FloatPtr;
-  SDValue IntPtr;
-  MachinePointerInfo IntPointerInfo;
-  MachinePointerInfo FloatPointerInfo;
-  SDValue IntValue;
-  APInt SignMask;
-  uint8_t SignBit;
-};
-
 //===----------------------------------------------------------------------===//
 /// This takes an arbitrary SelectionDAG as input and
 /// hacks on it until the target machine can handle it.  This involves
@@ -166,10 +152,6 @@ class SelectionDAGLegalize {
   SDValue ExpandSCALAR_TO_VECTOR(SDNode *Node);
   void ExpandDYNAMIC_STACKALLOC(SDNode *Node,
                                 SmallVectorImpl<SDValue> &Results);
-  void getSignAsIntValue(FloatSignAsInt &State, const SDLoc &DL,
-                         SDValue Value) const;
-  SDValue modifySignAsInt(const FloatSignAsInt &State, const SDLoc &DL,
-                          SDValue NewIntValue) const;
   SDValue ExpandFCOPYSIGN(SDNode *Node) const;
   SDValue ExpandFABS(SDNode *Node) const;
   SDValue ExpandFNEG(SDNode *Node) const;
@@ -1620,74 +1602,6 @@ SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) {
   return DAG.getLoad(VT, dl, StoreChain, FIPtr, PtrInfo);
 }
 
-/// Bitcast a floating-point value to an integer value. Only bitcast the part
-/// containing the sign bit if the target has no integer value capable of
-/// holding all bits of the floating-point value.
-void SelectionDAGLegalize::getSignAsIntValue(FloatSignAsInt &State,
-                                             const SDLoc &DL,
-                                             SDValue Value) const {
-  EVT FloatVT = Value.getValueType();
-  unsigned NumBits = FloatVT.getScalarSizeInBits();
-  State.FloatVT = FloatVT;
-  EVT IVT = EVT::getIntegerVT(*DAG.getContext(), NumBits);
-  // Convert to an integer of the same size.
-  if (TLI.isTypeLegal(IVT)) {
-    State.IntValue = DAG.getNode(ISD::BITCAST, DL, IVT, Value);
-    State.SignMask = APInt::getSignMask(NumBits);
-    State.SignBit = NumBits - 1;
-    return;
-  }
-
-  auto &DataLayout = DAG.getDataLayout();
-  // Store the float to memory, then load the sign part out as an integer.
-  MVT LoadTy = TLI.getRegisterType(MVT::i8);
-  // First create a temporary that is aligned for both the load and store.
-  SDValue StackPtr = DAG.CreateStackTemporary(FloatVT, LoadTy);
-  int FI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
-  // Then store the float to it.
-  State.FloatPtr = StackPtr;
-  MachineFunction &MF = DAG.getMachineFunction();
-  State.FloatPointerInfo = MachinePointerInfo::getFixedStack(MF, FI);
-  State.Chain = DAG.getStore(DAG.getEntryNode(), DL, Value, State.FloatPtr,
-                             State.FloatPointerInfo);
-
-  SDValue IntPtr;
-  if (DataLayout.isBigEndian()) {
-    assert(FloatVT.isByteSized() && "Unsupported floating point type!");
-    // Load out a legal integer with the same sign bit as the float.
-    IntPtr = StackPtr;
-    State.IntPointerInfo = State.FloatPointerInfo;
-  } else {
-    // Advance the pointer so that the loaded byte will contain the sign bit.
-    unsigned ByteOffset = (NumBits / 8) - 1;
-    IntPtr =
-        DAG.getMemBasePlusOffset(StackPtr, TypeSize::getFixed(ByteOffset), DL);
-    State.IntPointerInfo = MachinePointerInfo::getFixedStack(MF, FI,
-                                                             ByteOffset);
-  }
-
-  State.IntPtr = IntPtr;
-  State.IntValue = DAG.getExtLoad(ISD::EXTLOAD, DL, LoadTy, State.Chain, IntPtr,
-                                  State.IntPointerInfo, MVT::i8);
-  State.SignMask = APInt::getOneBitSet(LoadTy.getScalarSizeInBits(), 7);
-  State.SignBit = 7;
-}
-
-/// Replace the integer value produced by getSignAsIntValue() with a new value
-/// and cast the result back to a floating-point type.
-SDValue SelectionDAGLegalize::modifySignAsInt(const FloatSignAsInt &State,
-                                              const SDLoc &DL,
-                                              SDValue NewIntValue) const {
-  if (!State.Chain)
-    return DAG.getNode(ISD::BITCAST, DL, State.FloatVT, NewIntValue);
-
-  // Override the part containing the sign bit in the value stored on the stack.
-  SDValue Chain = DAG.getTruncStore(State.Chain, DL, NewIntValue, State.IntPtr,
-                                    State.IntPointerInfo, MVT::i8);
-  return DAG.getLoad(State.FloatVT, DL, Chain, State.FloatPtr,
-                     State.FloatPointerInfo);
-}
-
 SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
   SDLoc DL(Node);
   SDValue Mag = Node->getOperand(0);
@@ -1695,7 +1609,7 @@ SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
 
   // Get sign bit into an integer value.
   FloatSignAsInt SignAsInt;
-  getSignAsIntValue(SignAsInt, DL, Sign);
+  DAG.getSignAsIntValue(SignAsInt, DL, Sign);
 
   EVT IntVT = SignAsInt.IntValue.getValueType();
   SDValue SignMask = DAG.getConstant(SignAsInt.SignMask, DL, IntVT);
@@ -1716,7 +1630,7 @@ SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
 
   // Transform Mag value to integer, and clear the sign bit.
   FloatSignAsInt MagAsInt;
-  getSignAsIntValue(MagAsInt, DL, Mag);
+  DAG.getSignAsIntValue(MagAsInt, DL, Mag);
   EVT MagVT = MagAsInt.IntValue.getValueType();
   SDValue ClearSignMask = DAG.getConstant(~MagAsInt.SignMask, DL, MagVT);
   SDValue ClearedSign = DAG.getNode(ISD::AND, DL, MagVT, MagAsInt.IntValue,
@@ -1746,14 +1660,14 @@ SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
   SDValue CopiedSign = DAG.getNode(ISD::OR, DL, MagVT, ClearedSign, SignBit,
                                    SDNodeFlags::Disjoint);
 
-  return modifySignAsInt(MagAsInt, DL, CopiedSign);
+  return DAG.modifySignAsInt(MagAsInt, DL, CopiedSign);
 }
 
 SDValue SelectionDAGLegalize::ExpandFNEG(SDNode *Node) const {
   // Get the sign bit as an integer.
   SDLoc DL(Node);
   FloatSignAsInt SignAsInt;
-  getSignAsIntValue(SignAsInt, DL, Node->getOperand(0));
+  DAG.getSignAsIntValue(SignAsInt, DL, Node->getOperand(0));
   EVT IntVT = SignAsInt.IntValue.getValueType();
 
   // Flip the sign.
@@ -1762,7 +1676,7 @@ SDValue SelectionDAGLegalize::ExpandFNEG(SDNode *Node) const {
       DAG.getNode(ISD::XOR, DL, IntVT, SignAsInt.IntValue, SignMask);
 
   // Convert back to float.
-  return modifySignAsInt(SignAsInt, DL, SignFlip);
+  return DAG.modifySignAsInt(SignAsInt, DL, SignFlip);
 }
 
 SDValue SelectionDAGLegalize::ExpandFABS(SDNode *Node) const {
@@ -1778,12 +1692,12 @@ SDValue SelectionDAGLegalize::ExpandFABS(SDNode *Node) const {
 
   // Transform value to integer, clear the sign bit and transform back.
   FloatSignAsInt ValueAsInt;
-  getSignAsIntValue(ValueAsInt, DL, Value);
+  DAG.getSignAsIntValue(ValueAsInt, DL, Value);
   EVT IntVT = ValueAsInt.IntValue.getValueType();
   SDValue ClearSignMask = DAG.getConstant(~ValueAsInt.SignMask, DL, IntVT);
   SDValue ClearedSign = DAG.getNode(ISD::AND, DL, IntVT, ValueAsInt.IntValue,
                                     ClearSignMask);
-  return modifySignAsInt(ValueAsInt, DL, ClearedSign);
+  return DAG.modifySignAsInt(ValueAsInt, DL, ClearedSign);
 }
 
 void SelectionDAGLegalize::ExpandDYNAMIC_STACKALLOC(SDNode* Node,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 1506bc4ee187d..023d53f24ed19 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -2760,6 +2760,66 @@ SDValue SelectionDAG::CreateStackTemporary(EVT VT1, EVT VT2) {
   return CreateStackTemporary(Bytes, Align);
 }
 
+void SelectionDAG::getSignAsIntValue(FloatSignAsInt &State, const SDLoc &DL,
+                                     SDValue Value) {
+  EVT FloatVT = Value.getValueType();
+  unsigned NumBits = FloatVT.getScalarSizeInBits();
+  State.FloatVT = FloatVT;
+  EVT IVT = FloatVT.changeTypeToInteger();
+  // Convert to an integer of the same size.
+  if (TLI->isTypeLegal(IVT)) {
+    State.IntValue = getNode(ISD::BITCAST, DL, IVT, Value);
+    State.SignMask = APInt::getSignMask(NumBits);
+    State.SignBit = NumBits - 1;
+    return;
+  }
+
+  auto &DataLayout = getDataLayout();
+  // Store the float to memory, then load the sign part out as an integer.
+  MVT LoadTy = TLI->getRegisterType(MVT::i8);
+  // First create a temporary that is aligned for both the load and store.
+  SDValue StackPtr = CreateStackTemporary(FloatVT, LoadTy);
+  int FI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
+  // Then store the float to it.
+  State.FloatPtr = StackPtr;
+  MachineFunction &MF = getMachineFunction();
+  State.FloatPointerInfo = MachinePointerInfo::getFixedStack(MF, FI);
+  State.Chain = getStore(getEntryNode(), DL, Value, State.FloatPtr,
+                         State.FloatPointerInfo);
+
+  SDValue IntPtr;
+  if (DataLayout.isBigEndian()) {
+    assert(FloatVT.isByteSized() && "Unsupported floating point type!");
+    // Load out a legal integer with the same sign bit as the float.
+    IntPtr = StackPtr;
+    State.IntPointerInfo = State.FloatPointerInfo;
+  } else {
+    // Advance the pointer so that the loaded byte will contain the sign bit.
+    unsigned ByteOffset = (NumBits / 8) - 1;
+    IntPtr = getMemBasePlusOffset(StackPtr, TypeSize::getFixed(ByteOffset), DL);
+    State.IntPointerInfo =
+        MachinePointerInfo::getFixedStack(MF, FI, ByteOffset);
+  }
+
+  State.IntPtr = IntPtr;
+  State.IntValue = getExtLoad(ISD::EXTLOAD, DL, LoadTy, State.Chain, IntPtr,
+                              State.IntPointerInfo, MVT::i8);
+  State.SignMask = APInt::getOneBitSet(LoadTy.getScalarSizeInBits(), 7);
+  State.SignBit = 7;
+}
+
+SDValue SelectionDAG::modifySignAsInt(const FloatSignAsInt &State,
+                                      const SDLoc &DL, SDValue NewIntValue) {
+  if (!State.Chain)
+    return getNode(ISD::BITCAST, DL, State.FloatVT, NewIntValue);
+
+  // Override the part containing the sign bit in the value stored on the stack.
+  SDValue Chain = getTruncStore(State.Chain, DL, NewIntValue, State.IntPtr,
+                                State.IntPointerInfo, MVT::i8);
+  return getLoad(State.FloatVT, DL, Chain, State.FloatPtr,
+                 State.FloatPointerInfo);
+}
+
 SDValue SelectionDAG::FoldSetCC(EVT VT, SDValue N1, SDValue N2,
                                 ISD::CondCode Cond, const SDLoc &dl) {
   EVT OpVT = N1.getValueType();
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 22d0bc9914585..13315fed7ed2a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -8610,16 +8610,18 @@ SDValue TargetLowering::expandFMINIMUM_FMAXIMUM(SDNode *N,
   // fminimum/fmaximum requires -0.0 less than +0.0
   if (!MinMaxMustRespectOrderedZero && !N->getFlags().hasNoSignedZeros() &&
       !DAG.isKnownNeverZeroFloat(RHS) && !DAG.isKnownNeverZeroFloat(LHS)) {
+    auto IsSpecificZero = [&](SDValue F) {
+      FloatSignAsInt State;
+      DAG.getSignAsIntValue(State, DL, F);
+      return DAG.getSetCC(DL, CCVT, State.IntValue,
+                          DAG.getConstant(0, DL, State.IntValue.getValueType()),
+                          IsMax ? ISD::SETEQ : ISD::SETNE);
+    };
     SDValue IsZero = DAG.getSetCC(DL, CCVT, MinMax,
                                   DAG.getConstantFP(0.0, DL, VT), ISD::SETOEQ);
-    SDValue TestZero =
-        DAG.getTargetConstant(IsMax ? fcPosZero : fcNegZero, DL, MVT::i32);
-    SDValue LCmp = DAG.getSelect(
-        DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, LHS, TestZero), LHS,
-        MinMax, Flags);
-    SDValue RCmp = DAG.getSelect(
-        DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, RHS, TestZero), RHS,
-        LCmp, Flags);
+    SDValue LCmp =
+        DAG.getSelect(DL, VT, IsSpecificZero(LHS), LHS, MinMax, Flags);
+    SDValue RCmp = DAG.getSelect(DL, VT, IsSpecificZero(RHS), RHS, LCmp, Flags);
     MinMax = DAG.getSelect(DL, VT, IsZero, RCmp, MinMax, Flags);
   }
 
diff --git a/llvm/test/CodeGen/AArch64/fmaximum-legalization.ll b/llvm/test/CodeGen/AArch64/fmaximum-legalization.ll
index 86c1474068482..9f542abcb80f7 100644
--- a/llvm/test/CodeGen/AArch64/fmaximum-legalization.ll
+++ b/llvm/test/CodeGen/AArch64/fmaximum-legalization.ll
@@ -41,3 +41,59 @@ define <4 x half> @fmaximum_v4f16(<4 x half> %x, <4 x half> %y) {
   %r = call <4 x half> @llvm.maximum.v4f16(<4 x half> %x, <4 x half> %y)
   ret <4 x half> %r
 }
+
+define fp128 @maximum_fp128(fp128 %x, fp128 %y) nounwind {
+; CHECK-LABEL: maximum_fp128:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sub sp, sp, #96
+; CHECK-NEXT:    str x30, [sp, #80] // 8-byte Folded Spill
+; CHECK-NEXT:    stp q0, q1, [sp] // 32-byte Folded Spill
+; CHECK-NEXT:    stp q1, q0, [sp, #48]
+; CHECK-NEXT:    bl __gttf2
+; CHECK-NEXT:    ldp q0, q1, [sp] // 32-byte Folded Reload
+; CHECK-NEXT:    cmp w0, #0
+; CHECK-NEXT:    b.le .LBB1_2
+; CHECK-NEXT:  // %bb.1:
+; CHECK-NEXT:    mov v1.16b, v0.16b
+; CHECK-NEXT:  .LBB1_2:
+; CHECK-NEXT:    str q1, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    ldr q1, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    bl __unordtf2
+; CHECK-NEXT:    ldr q0, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    cmp w0, #0
+; CHECK-NEXT:    b.eq .LBB1_4
+; CHECK-NEXT:  // %bb.3:
+; CHECK-NEXT:    adrp x8, .LCPI1_0
+; CHECK-NEXT:    ldr q0, [x8, :lo12:.LCPI1_0]
+; CHECK-NEXT:  .LBB1_4:
+; CHECK-NEXT:    ldrb w8, [sp, #79]
+; CHECK-NEXT:    mov v1.16b, v0.16b
+; CHECK-NEXT:    cmp w8, #0
+; CHECK-NEXT:    b.ne .LBB1_6
+; CHECK-NEXT:  // %bb.5:
+; CHECK-NEXT:    ldr q1, [sp] // 16-byte Folded Reload
+; CHECK-NEXT:  .LBB1_6:
+; CHECK-NEXT:    ldrb w8, [sp, #63]
+; CHECK-NEXT:    cmp w8, #0
+; CHECK-NEXT:    b.ne .LBB1_8
+; CHECK-NEXT:  // %bb.7:
+; CHECK-NEXT:    ldr q1, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:  .LBB1_8:
+; CHECK-NEXT:    adrp x8, .LCPI1_1
+; CHECK-NEXT:    str q0, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    str q1, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    ldr q1, [x8, :lo12:.LCPI1_1]
+; CHECK-NEXT:    ldr q0, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    bl __eqtf2
+; CHECK-NEXT:    ldr q0, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    cmp w0, #0
+; CHECK-NEXT:    b.ne .LBB1_10
+; CHECK-NEXT:  // %bb.9:
+; CHECK-NEXT:    ldr q0, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:  .LBB1_10:
+; CHECK-NEXT:    ldr x30, [sp, #80] // 8-byte Folded Reload
+; CHECK-NEXT:    add sp, sp, #96
+; CHECK-NEXT:    ret
+  %res = call fp128 @llvm.maximum.f128(fp128 %x, fp128 %y)
+  ret fp128 %res
+}
diff --git a/llvm/test/CodeGen/ARM/fp-maximum-legalization.ll b/llvm/test/CodeGen/ARM/fp-maximum-legalization.ll
new file mode 100644
index 0000000000000..a3ab144356e16
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/fp-maximum-legalization.ll
@@ -0,0 +1,49 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=armv7 | FileCheck %s
+
+define double @maximum_double(double %x, double %y) nounwind {
+; CHECK-LABEL: maximum_double:
+; CHECK:       @ %bb.0:
+; CHECK-NEXT:    sub sp, sp, #16
+; CHECK-NEXT:    vmov d17, r2, r3
+; CHECK-NEXT:    mov r2, #0
+; CHECK-NEXT:    vmov d16, r0, r1
+; CHECK-NEXT:    mov r3, #0
+; CHECK-NEXT:    vcmp.f64 d16, d17
+; CHECK-NEXT:    mov r0, #0
+; CHECK-NEXT:    vmrs APSR_nzcv, fpscr
+; CHECK-NEXT:    vstr d16, [sp, #8]
+; CHECK-NEXT:    vstr d17, [sp]
+; CHECK-NEXT:    ldrb r1, [sp, #15]
+; CHECK-NEXT:    vmov.f64 d19, d17
+; CHECK-NEXT:    clz r1, r1
+; CHECK-NEXT:    vldr d18, .LCPI0_0
+; CHECK-NEXT:    movwvs r2, #1
+; CHECK-NEXT:    movwgt r3, #1
+; CHECK-NEXT:    cmp r3, #0
+; CHECK-NEXT:    vmovne.f64 d19, d16
+; CHECK-NEXT:    cmp r2, #0
+; CHECK-NEXT:    ldrb r2, [sp, #7]
+; CHECK-NEXT:    vmovne.f64 d19, d18
+; CHECK-NEXT:    lsrs r1, r1, #5
+; CHECK-NEXT:    clz r1, r2
+; CHECK-NEXT:    vcmp.f64 d19, #0
+; CHECK-NEXT:    vmov.f64 d18, d19
+; CHECK-NEXT:    vmovne.f64 d18, d16
+; CHECK-NEXT:    lsrs r1, r1, #5
+; CHECK-NEXT:    vmovne.f64 d18, d17
+; CHECK-NEXT:    vmrs APSR_nzcv, fpscr
+; CHECK-NEXT:    movweq r0, #1
+; CHECK-NEXT:    cmp r0, #0
+; CHECK-NEXT:    vmovne.f64 d19, d18
+; CHECK-NEXT:    vmov r0, r1, d19
+; CHECK-NEXT:    add sp, sp, #16
+; CHECK-NEXT:    bx lr
+; CHECK-NEXT:    .p2align 3
+; CHECK-NEXT:  @ %bb.1:
+; CHECK-NEXT:  .LCPI0_0:
+; CHECK-NEXT:    .long 0 @ double NaN
+; CHECK-NEXT:    .long 2146959360
+  %res = call double @llvm.maximum(double %x, double %y)
+  ret double %res
+}
diff --git a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
index 32225ed04e2d9..096649e5bde43 100644
--- a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
@@ -1476,7 +1476,7 @@ define <2 x bfloat> @test_maximum_v2(<2 x bfloat> %a, <2 x bfloat> %b) {
 ; SM70-LABEL: test_maximum_v2(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<11>;
-; SM70-NEXT:    .reg .b16 %rs<15>;
+; SM70-NEXT:    .reg .b16 %rs<19>;
 ; SM70-NEXT:    .reg .b32 %r<16>;
 ; SM70-EMPTY:
 ; SM70-NEXT:  // %bb.0:
@@ -1493,30 +1493,30 @@ define <2 x bfloat> @test_maximum_v2(<2 x bfloat> %a, <2 x bfloat> %b) {
 ; SM70-NEXT:    setp.nan.f32 %p2, %r6, %r4;
 ; SM70-NEXT:    selp.b16 %rs6, 0x7FC0, %rs5, %p2;
 ; SM70-NEXT:    setp.eq.s16 %p3, %rs4, 0;
-; SM70-NEXT:    selp.b16 %rs7, %rs4, %rs6, %p3;
+; SM70-NEXT:    selp.b16 %rs9, %rs4, %rs6, %p3;
 ; SM70-NEXT:    setp.eq.s16 %p4, %rs2, 0;
-; SM70-NEXT:    selp.b16 %rs8, %rs2, %rs7, %p4;
+; SM70-NEXT:    selp.b16 %rs12, %rs2, %rs9, %p4;
 ; SM70-NEXT:    cvt.u32.u16 %r7, %rs6;
 ; SM70-NEXT:    shl.b32 %r8, %r7, 16;
 ; SM70-NEXT:    setp.eq.f32 %p5, %r8, 0f00000000;
-; SM70-NEXT:    selp.b16 %rs9, %rs8, %rs6, %p5;
+; SM70-NEXT:    selp.b16 %rs13, %rs12, %rs6, %p5;
 ; SM70-NEXT:    cvt.u32.u16 %r9, %rs1;
 ; SM70-NEXT:    shl.b32 %r10, %r9, 16;
 ; SM70-NEXT:    cvt.u32.u16 %r11, %rs3;
 ; SM70-NEXT:    shl.b32 %r12, %r11, 16;
 ; SM70-NEXT:    setp.gt.f32 %p6, %r12, %r10;
-; SM70-NEXT:    selp.b16 %rs10, %rs3, %rs1, %p6;
+; SM70-NEXT:    selp.b16 %rs14, %rs3, %rs1, %p6;
 ; SM70-NEXT:    setp.nan.f32 %p7, %r12, %r10;
-; SM70-NEXT:    selp.b16 %rs11, 0x7FC0, %rs10, %p7;
+;...
[truncated]

llvm/test/CodeGen/X86/fminimum-fmaximum.ll

gonzalobg · 2025-05-30T18:34:02Z

llvm/test/CodeGen/NVPTX/bf16-instructions.ll

@@ -1476,7 +1476,7 @@ define <2 x bfloat> @test_maximum_v2(<2 x bfloat> %a, <2 x bfloat> %b) {
 ; SM70-LABEL: test_maximum_v2(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<11>;
-; SM70-NEXT:    .reg .b16 %rs<15>;
+; SM70-NEXT:    .reg .b16 %rs<19>;


@AlexMaclean this seems to increase the register usage ?

These numbers can be a bit misleading since there are sometimes holes in the register range. That is what seems to be happening here. NVPTX changes look like improvements or no-ops to me fwiw.

nikic · 2025-05-30T19:40:51Z

llvm/test/CodeGen/X86/fminimum-fmaximum.ll

+; X86-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    movl %eax, (%esp)
+; X86-NEXT:    vzeroupper
+; X86-NEXT:    calll fmaximuml


Side note: This is the wrong libcall for x86. I'll fix it separately.

I think #76558 will cover this, whenever that is closer to ready.

nikic · 2025-05-30T19:54:28Z

One thing this makes me wonder is why we're not preferring libcall legalization for scalar fmaximum. The generated code is quite complex and probably not better than using a libcall.

arsenm · 2025-05-30T20:03:42Z

One thing this makes me wonder is why we're not preferring libcall legalization for scalar fmaximum. The generated code is quite complex and probably not better than using a libcall.

Are there even libcalls? We unfortunately don't know the target runtime versions and these are really new calls

nikic · 2025-05-30T20:17:15Z

One thing this makes me wonder is why we're not preferring libcall legalization for scalar fmaximum. The generated code is quite complex and probably not better than using a libcall.

Are there even libcalls? We unfortunately don't know the target runtime versions and these are really new calls

I'd expect that if the ops exist in IR, then the libcalls do as well. Either because it originated from a libcall in new enough C, or because it's a non-C language that is capable of providing its own builtins (a requirement whenever you're exposing more ops or types than C does).

nikic · 2025-06-02T10:37:40Z

Same issue exists for fmaximumnum. I have a fix for that, but #139237 needs to land first.

This disables the f64 min/max tests for the arm-unknown-linux-gnueabihf job. The next release will be supporting cross-compiled doctests, and these tests fail on that platform. It looks like this was just fixed via llvm/llvm-project#142170, but I assume that will not trickle down to our copy of llvm in the next couple of weeks. Assuming that does get fixed when llvm is updated, then these can be removed. cc rust-lang#141087

This disables the f64 minimum/maximum tests for the arm-unknown-linux-gnueabihf job. The next release will be supporting cross-compiled doctests, and these tests fail on that platform. It looks like this was just fixed via llvm/llvm-project#142170, but I assume that will not trickle down to our copy of llvm in the next couple of weeks. Assuming that does get fixed when llvm is updated, then these can be removed. cc rust-lang#141087

Disable f64 minimum/maximum tests for arm 32 This disables the f64 minimum/maximum tests for the arm-unknown-linux-gnueabihf job. The next release will be supporting cross-compiled doctests, and these tests fail on that platform. It looks like this was just fixed via llvm/llvm-project#142170, but I assume that will not trickle down to our copy of llvm in the next couple of weeks. Assuming that does get fixed when llvm is updated, then these can be removed. cc rust-lang#141087

Rollup merge of #141921 - ehuss:arm-min-max, r=tgross35 Disable f64 minimum/maximum tests for arm 32 This disables the f64 minimum/maximum tests for the arm-unknown-linux-gnueabihf job. The next release will be supporting cross-compiled doctests, and these tests fail on that platform. It looks like this was just fixed via llvm/llvm-project#142170, but I assume that will not trickle down to our copy of llvm in the next couple of weeks. Assuming that does get fixed when llvm is updated, then these can be removed. cc #141087

This disables the f64 minimum/maximum tests for the arm-unknown-linux-gnueabihf job. The next release will be supporting cross-compiled doctests, and these tests fail on that platform. It looks like this was just fixed via llvm/llvm-project#142170, but I assume that will not trickle down to our copy of llvm in the next couple of weeks. Assuming that does get fixed when llvm is updated, then these can be removed. cc rust-lang#141087

wzssyqa · 2025-06-04T00:28:42Z

Same issue exists for fmaximumnum. I have a fix for that, but #139237 needs to land first.

I have another proposal with FP_ROUND: ROUND to FP32 first.
#140193

wzssyqa · 2025-06-04T00:58:46Z

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

-        LCmp, Flags);
+    SDValue LCmp =
+        DAG.getSelect(DL, VT, IsSpecificZero(LHS), LHS, MinMax, Flags);
+    SDValue RCmp = DAG.getSelect(DL, VT, IsSpecificZero(RHS), RHS, LCmp, Flags);


In fact, we need only one IsSpecificZero here.
When Max, if LHS is positive, we can select it.
When Min, if LHS is positive, we can just select MinMax, since when have already preferred RHS if LHS==RHS.

Good point, I submitted #142732 to drop one the comparisons.

When Min, if LHS is positive, we can just select MinMax, since when have already preferred RHS if LHS==RHS.

This is only guaranteed if the main legalization used the select form, rather than one of the min/max opcodes. ~~But we also don't need to rely on it as we can directly select RHS instead of MinMax.~~

d0k · 2025-06-04T12:32:18Z

I think this is wrong for fminimum. We check that LHS is not equal integer 0, but we need to check that it's exactly -0.0 or it might pick a random value in LHS. e.g. fmin(-1.0, 0.0)

This reverts commit 58cc167. I also made the incorrect assumption that we know both values are +/-0.0 here as well. Revert for now.

nikic · 2025-06-04T12:39:55Z

Yeah, I made the same incorrect assumption that we already know both values are +/-0.0 here as well. Reverted for now.

nikic added 4 commits May 30, 2025 16:31

Use bitcast instead of IS_FPCLASS

ee1c82f

Move FloatSignAsInt helpers to SelectionDAG.h

0020c2b

Use FloatSignAsInt helper

e0da636

Add tests for issues

468a893

nikic requested review from arsenm, wzssyqa, RKSimon and topperc May 30, 2025 15:33

llvmbot added backend:ARM backend:AArch64 backend:PowerPC backend:X86 backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well labels May 30, 2025

arsenm reviewed May 30, 2025

View reviewed changes

llvm/test/CodeGen/X86/fminimum-fmaximum.ll Show resolved Hide resolved

Add i686 test

babd339

gonzalobg reviewed May 30, 2025

View reviewed changes

arsenm approved these changes May 30, 2025

View reviewed changes

nikic commented May 30, 2025

View reviewed changes

nikic merged commit 58cc167 into llvm:main Jun 2, 2025
11 checks passed

nikic deleted the maximum-is-fpclass branch June 2, 2025 08:14

ehuss mentioned this pull request Jun 2, 2025

Disable f64 minimum/maximum tests for arm 32 rust-lang/rust#141921

Merged

wzssyqa reviewed Jun 4, 2025

View reviewed changes

nikic added a commit that referenced this pull request Jun 4, 2025

Revert "[SDAG] Fix fmaximum legalization errors (#142170)"

d74831e

This reverts commit 58cc167. I also made the incorrect assumption that we know both values are +/-0.0 here as well. Revert for now.

[SDAG] Fix fmaximum legalization errors #142170

[SDAG] Fix fmaximum legalization errors #142170

Uh oh!

Conversation

nikic commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 30, 2025

Uh oh!

Uh oh!

gonzalobg May 30, 2025

Choose a reason for hiding this comment

Uh oh!

AlexMaclean May 30, 2025

Choose a reason for hiding this comment

Uh oh!

nikic May 30, 2025

Choose a reason for hiding this comment

Uh oh!

tgross35 Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

nikic commented May 30, 2025

Uh oh!

arsenm commented May 30, 2025

Uh oh!

nikic commented May 30, 2025

Uh oh!

Uh oh!

nikic commented Jun 2, 2025

Uh oh!

wzssyqa commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wzssyqa Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d0k commented Jun 4, 2025

Uh oh!

nikic commented Jun 4, 2025

Uh oh!

Uh oh!

nikic commented May 30, 2025 •

edited

Loading

llvmbot commented May 30, 2025 •

edited

Loading

wzssyqa commented Jun 4, 2025 •

edited

Loading

nikic Jun 4, 2025 •

edited

Loading