Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAGCombiner] Attempt to fold 'add' nodes to funnel-shift or rotate #125612

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

AlexMaclean
Copy link
Member

Almost all of the rotate idioms that are valid for an 'or' are also valid when the halves are combined with an 'add'. Further, many of these cases are not handled by common bits tracking meaning that the 'add' is not converted to a 'disjoint or'.

@AlexMaclean AlexMaclean self-assigned this Feb 4, 2025
@llvmbot llvmbot added backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well labels Feb 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 4, 2025

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-llvm-selectiondag

Author: Alex MacLean (AlexMaclean)

Changes

Almost all of the rotate idioms that are valid for an 'or' are also valid when the halves are combined with an 'add'. Further, many of these cases are not handled by common bits tracking meaning that the 'add' is not converted to a 'disjoint or'.


Full diff: https://github.com/llvm/llvm-project/pull/125612.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+53-43)
  • (added) llvm/test/CodeGen/NVPTX/add-rotate.ll (+118)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index f4caaf426de6a07..d671a6d555621c1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -662,14 +662,15 @@ namespace {
                                bool DemandHighBits = true);
     SDValue MatchBSwapHWord(SDNode *N, SDValue N0, SDValue N1);
     SDValue MatchRotatePosNeg(SDValue Shifted, SDValue Pos, SDValue Neg,
-                              SDValue InnerPos, SDValue InnerNeg, bool HasPos,
-                              unsigned PosOpcode, unsigned NegOpcode,
-                              const SDLoc &DL);
+                              SDValue InnerPos, SDValue InnerNeg, bool FromAdd,
+                              bool HasPos, unsigned PosOpcode,
+                              unsigned NegOpcode, const SDLoc &DL);
     SDValue MatchFunnelPosNeg(SDValue N0, SDValue N1, SDValue Pos, SDValue Neg,
-                              SDValue InnerPos, SDValue InnerNeg, bool HasPos,
-                              unsigned PosOpcode, unsigned NegOpcode,
-                              const SDLoc &DL);
-    SDValue MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL);
+                              SDValue InnerPos, SDValue InnerNeg, bool FromAdd,
+                              bool HasPos, unsigned PosOpcode,
+                              unsigned NegOpcode, const SDLoc &DL);
+    SDValue MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL,
+                        bool FromAdd);
     SDValue MatchLoadCombine(SDNode *N);
     SDValue mergeTruncStores(StoreSDNode *N);
     SDValue reduceLoadWidth(SDNode *N);
@@ -2992,6 +2993,9 @@ SDValue DAGCombiner::visitADD(SDNode *N) {
   if (SDValue V = foldAddSubOfSignBit(N, DL, DAG))
     return V;
 
+  if (SDValue V = MatchRotate(N0, N1, SDLoc(N), /*FromAdd=*/true))
+    return V;
+
   // Try to match AVGFLOOR fixedwidth pattern
   if (SDValue V = foldAddToAvg(N, DL))
     return V;
@@ -8161,7 +8165,7 @@ SDValue DAGCombiner::visitOR(SDNode *N) {
       return V;
 
   // See if this is some rotate idiom.
-  if (SDValue Rot = MatchRotate(N0, N1, DL))
+  if (SDValue Rot = MatchRotate(N0, N1, DL, /*FromAdd=*/false))
     return Rot;
 
   if (SDValue Load = MatchLoadCombine(N))
@@ -8350,7 +8354,7 @@ static SDValue extractShiftForRotate(SelectionDAG &DAG, SDValue OppShift,
 // The IsRotate flag should be set when the LHS of both shifts is the same.
 // Otherwise if matching a general funnel shift, it should be clear.
 static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,
-                           SelectionDAG &DAG, bool IsRotate) {
+                           SelectionDAG &DAG, bool IsRotate, bool FromAdd) {
   const auto &TLI = DAG.getTargetLoweringInfo();
   // If EltSize is a power of 2 then:
   //
@@ -8389,7 +8393,7 @@ static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,
   // NOTE: We can only do this when matching operations which won't modify the
   // least Log2(EltSize) significant bits and not a general funnel shift.
   unsigned MaskLoBits = 0;
-  if (IsRotate && isPowerOf2_64(EltSize)) {
+  if (IsRotate && !FromAdd && isPowerOf2_64(EltSize)) {
     unsigned Bits = Log2_64(EltSize);
     unsigned NegBits = Neg.getScalarValueSizeInBits();
     if (NegBits >= Bits) {
@@ -8472,9 +8476,9 @@ static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,
 // Neg with outer conversions stripped away.
 SDValue DAGCombiner::MatchRotatePosNeg(SDValue Shifted, SDValue Pos,
                                        SDValue Neg, SDValue InnerPos,
-                                       SDValue InnerNeg, bool HasPos,
-                                       unsigned PosOpcode, unsigned NegOpcode,
-                                       const SDLoc &DL) {
+                                       SDValue InnerNeg, bool FromAdd,
+                                       bool HasPos, unsigned PosOpcode,
+                                       unsigned NegOpcode, const SDLoc &DL) {
   // fold (or (shl x, (*ext y)),
   //          (srl x, (*ext (sub 32, y)))) ->
   //   (rotl x, y) or (rotr x, (sub 32, y))
@@ -8484,10 +8488,9 @@ SDValue DAGCombiner::MatchRotatePosNeg(SDValue Shifted, SDValue Pos,
   //   (rotr x, y) or (rotl x, (sub 32, y))
   EVT VT = Shifted.getValueType();
   if (matchRotateSub(InnerPos, InnerNeg, VT.getScalarSizeInBits(), DAG,
-                     /*IsRotate*/ true)) {
+                     /*IsRotate*/ true, FromAdd))
     return DAG.getNode(HasPos ? PosOpcode : NegOpcode, DL, VT, Shifted,
                        HasPos ? Pos : Neg);
-  }
 
   return SDValue();
 }
@@ -8500,9 +8503,9 @@ SDValue DAGCombiner::MatchRotatePosNeg(SDValue Shifted, SDValue Pos,
 // TODO: Merge with MatchRotatePosNeg.
 SDValue DAGCombiner::MatchFunnelPosNeg(SDValue N0, SDValue N1, SDValue Pos,
                                        SDValue Neg, SDValue InnerPos,
-                                       SDValue InnerNeg, bool HasPos,
-                                       unsigned PosOpcode, unsigned NegOpcode,
-                                       const SDLoc &DL) {
+                                       SDValue InnerNeg, bool FromAdd,
+                                       bool HasPos, unsigned PosOpcode,
+                                       unsigned NegOpcode, const SDLoc &DL) {
   EVT VT = N0.getValueType();
   unsigned EltBits = VT.getScalarSizeInBits();
 
@@ -8513,10 +8516,10 @@ SDValue DAGCombiner::MatchFunnelPosNeg(SDValue N0, SDValue N1, SDValue Pos,
   // fold (or (shl x0, (*ext (sub 32, y))),
   //          (srl x1, (*ext y))) ->
   //   (fshr x0, x1, y) or (fshl x0, x1, (sub 32, y))
-  if (matchRotateSub(InnerPos, InnerNeg, EltBits, DAG, /*IsRotate*/ N0 == N1)) {
+  if (matchRotateSub(InnerPos, InnerNeg, EltBits, DAG, /*IsRotate*/ N0 == N1,
+                     FromAdd))
     return DAG.getNode(HasPos ? PosOpcode : NegOpcode, DL, VT, N0, N1,
                        HasPos ? Pos : Neg);
-  }
 
   // Matching the shift+xor cases, we can't easily use the xor'd shift amount
   // so for now just use the PosOpcode case if its legal.
@@ -8561,11 +8564,12 @@ SDValue DAGCombiner::MatchFunnelPosNeg(SDValue N0, SDValue N1, SDValue Pos,
   return SDValue();
 }
 
-// MatchRotate - Handle an 'or' of two operands.  If this is one of the many
-// idioms for rotate, and if the target supports rotation instructions, generate
-// a rot[lr]. This also matches funnel shift patterns, similar to rotation but
-// with different shifted sources.
-SDValue DAGCombiner::MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL) {
+// MatchRotate - Handle an 'or' or 'add' of two operands.  If this is one of the
+// many idioms for rotate, and if the target supports rotation instructions,
+// generate a rot[lr]. This also matches funnel shift patterns, similar to
+// rotation but with different shifted sources.
+SDValue DAGCombiner::MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL,
+                                 bool FromAdd) {
   EVT VT = LHS.getValueType();
 
   // The target must have at least one rotate/funnel flavor.
@@ -8592,9 +8596,9 @@ SDValue DAGCombiner::MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL) {
   if (LHS.getOpcode() == ISD::TRUNCATE && RHS.getOpcode() == ISD::TRUNCATE &&
       LHS.getOperand(0).getValueType() == RHS.getOperand(0).getValueType()) {
     assert(LHS.getValueType() == RHS.getValueType());
-    if (SDValue Rot = MatchRotate(LHS.getOperand(0), RHS.getOperand(0), DL)) {
+    if (SDValue Rot =
+            MatchRotate(LHS.getOperand(0), RHS.getOperand(0), DL, FromAdd))
       return DAG.getNode(ISD::TRUNCATE, SDLoc(LHS), LHS.getValueType(), Rot);
-    }
   }
 
   // Match "(X shl/srl V1) & V2" where V2 may not be present.
@@ -8773,30 +8777,36 @@ SDValue DAGCombiner::MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL) {
     RExtOp0 = RHSShiftAmt.getOperand(0);
   }
 
+  // // If we are here from visitADD() we must ensure the Right-Shift Amt is
+  // // non-zero when the pattern includes AND op. So, allow optimizing to ROTL
+  // // only if it is recognized as a non-zero constant. Same for ROTR.
+  // auto RotateSafe = [FromAdd](const SDValue& ExtOp0) {
+  //   if (!FromAdd || ExtOp0.getOpcode() != ISD::AND)
+  //     return true;
+  //   auto *ExtOp0Const = dyn_cast<ConstantSDNode>(ExtOp0);
+  //   return ExtOp0Const && !ExtOp0Const->isZero();
+  // };
+
   if (IsRotate && (HasROTL || HasROTR)) {
-    SDValue TryL =
-        MatchRotatePosNeg(LHSShiftArg, LHSShiftAmt, RHSShiftAmt, LExtOp0,
-                          RExtOp0, HasROTL, ISD::ROTL, ISD::ROTR, DL);
-    if (TryL)
+    if (SDValue TryL = MatchRotatePosNeg(LHSShiftArg, LHSShiftAmt, RHSShiftAmt,
+                                         LExtOp0, RExtOp0, FromAdd, HasROTL,
+                                         ISD::ROTL, ISD::ROTR, DL))
       return TryL;
 
-    SDValue TryR =
-        MatchRotatePosNeg(RHSShiftArg, RHSShiftAmt, LHSShiftAmt, RExtOp0,
-                          LExtOp0, HasROTR, ISD::ROTR, ISD::ROTL, DL);
-    if (TryR)
+    if (SDValue TryR = MatchRotatePosNeg(RHSShiftArg, RHSShiftAmt, LHSShiftAmt,
+                                         RExtOp0, LExtOp0, FromAdd, HasROTR,
+                                         ISD::ROTR, ISD::ROTL, DL))
       return TryR;
   }
 
-  SDValue TryL =
-      MatchFunnelPosNeg(LHSShiftArg, RHSShiftArg, LHSShiftAmt, RHSShiftAmt,
-                        LExtOp0, RExtOp0, HasFSHL, ISD::FSHL, ISD::FSHR, DL);
-  if (TryL)
+  if (SDValue TryL = MatchFunnelPosNeg(LHSShiftArg, RHSShiftArg, LHSShiftAmt,
+                                       RHSShiftAmt, LExtOp0, RExtOp0, FromAdd,
+                                       HasFSHL, ISD::FSHL, ISD::FSHR, DL))
     return TryL;
 
-  SDValue TryR =
-      MatchFunnelPosNeg(LHSShiftArg, RHSShiftArg, RHSShiftAmt, LHSShiftAmt,
-                        RExtOp0, LExtOp0, HasFSHR, ISD::FSHR, ISD::FSHL, DL);
-  if (TryR)
+  if (SDValue TryR = MatchFunnelPosNeg(LHSShiftArg, RHSShiftArg, RHSShiftAmt,
+                                       LHSShiftAmt, RExtOp0, LExtOp0, FromAdd,
+                                       HasFSHR, ISD::FSHR, ISD::FSHL, DL))
     return TryR;
 
   return SDValue();
diff --git a/llvm/test/CodeGen/NVPTX/add-rotate.ll b/llvm/test/CodeGen/NVPTX/add-rotate.ll
new file mode 100644
index 000000000000000..f777ac964154343
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/add-rotate.ll
@@ -0,0 +1,118 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_50 | FileCheck %s
+
+target triple = "nvptx64-nvidia-cuda"
+
+define i32 @test_rotl(i32 %x) {
+; CHECK-LABEL: test_rotl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [test_rotl_param_0];
+; CHECK-NEXT:    shf.l.wrap.b32 %r2, %r1, %r1, 7;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %shl = shl i32 %x, 7
+  %shr = lshr i32 %x, 25
+  %add = add i32 %shl, %shr
+  ret i32 %add
+}
+
+define i32 @test_rotr(i32 %x) {
+; CHECK-LABEL: test_rotr(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [test_rotr_param_0];
+; CHECK-NEXT:    shf.l.wrap.b32 %r2, %r1, %r1, 25;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %shr = lshr i32 %x, 7
+  %shl = shl i32 %x, 25
+  %add = add i32 %shr, %shl
+  ret i32 %add
+}
+
+define i32 @test_rotl_var(i32 %x, i32 %y) {
+; CHECK-LABEL: test_rotl_var(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [test_rotl_var_param_0];
+; CHECK-NEXT:    ld.param.u32 %r2, [test_rotl_var_param_1];
+; CHECK-NEXT:    shf.l.wrap.b32 %r3, %r1, %r1, %r2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT:    ret;
+  %shl = shl i32 %x, %y
+  %sub = sub i32 32, %y
+  %shr = lshr i32 %x, %sub
+  %add = add i32 %shl, %shr
+  ret i32 %add
+}
+
+define i32 @test_rotr_var(i32 %x, i32 %y) {
+; CHECK-LABEL: test_rotr_var(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [test_rotr_var_param_0];
+; CHECK-NEXT:    ld.param.u32 %r2, [test_rotr_var_param_1];
+; CHECK-NEXT:    shf.r.wrap.b32 %r3, %r1, %r1, %r2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT:    ret;
+  %shr = lshr i32 %x, %y
+  %sub = sub i32 32, %y
+  %shl = shl i32 %x, %sub
+  %add = add i32 %shr, %shl
+  ret i32 %add
+}
+
+define i32 @test_rotl_var_and(i32 %x, i32 %y) {
+; CHECK-LABEL: test_rotl_var_and(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<8>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [test_rotl_var_and_param_0];
+; CHECK-NEXT:    ld.param.u32 %r2, [test_rotl_var_and_param_1];
+; CHECK-NEXT:    shl.b32 %r3, %r1, %r2;
+; CHECK-NEXT:    neg.s32 %r4, %r2;
+; CHECK-NEXT:    and.b32 %r5, %r4, 31;
+; CHECK-NEXT:    shr.u32 %r6, %r1, %r5;
+; CHECK-NEXT:    add.s32 %r7, %r6, %r3;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r7;
+; CHECK-NEXT:    ret;
+  %shr = shl i32 %x, %y
+  %sub = sub nsw i32 0, %y
+  %and = and i32 %sub, 31
+  %shl = lshr i32 %x, %and
+  %add = add i32 %shl, %shr
+  ret i32 %add
+}
+
+define i32 @test_rotr_var_and(i32 %x, i32 %y) {
+; CHECK-LABEL: test_rotr_var_and(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<8>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [test_rotr_var_and_param_0];
+; CHECK-NEXT:    ld.param.u32 %r2, [test_rotr_var_and_param_1];
+; CHECK-NEXT:    shr.u32 %r3, %r1, %r2;
+; CHECK-NEXT:    neg.s32 %r4, %r2;
+; CHECK-NEXT:    and.b32 %r5, %r4, 31;
+; CHECK-NEXT:    shl.b32 %r6, %r1, %r5;
+; CHECK-NEXT:    add.s32 %r7, %r3, %r6;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r7;
+; CHECK-NEXT:    ret;
+  %shr = lshr i32 %x, %y
+  %sub = sub nsw i32 0, %y
+  %and = and i32 %sub, 31
+  %shl = shl i32 %x, %and
+  %add = add i32 %shr, %shl
+  ret i32 %add
+}

@Bhramar-vatsa
Copy link
Contributor

Thanks @AlexMaclean. We had a situation downstream which needed this transformation at SelectionDAG level too. I just wanted to mention that I tried following, similar to your change at IR level in 59ced72bc211f (using the pattern matching framework at SD level).

static bool haveNoCommonBitsSetCommutative(SDValue A, SDValue B) {
  ...
+  SDValue V;
+  APInt R;
+    if (((sd_match(B, m_Shl(m_Value(), m_Sub(m_ConstInt(R), m_Value(V)))) &&
+          sd_match(A, m_Srl(m_Value(), m_Specific(V)))) ||
+         (sd_match(B, m_Srl(m_Value(), m_Sub(m_ConstInt(R), m_Value(V)))) &&
+          sd_match(A, m_Shl(m_Value(), m_Specific(V))))) &&
+        R.uge(A.getValueType().getScalarSizeInBits()))
+      return true;
      
  return false;
}

This then triggers the 'add' to 'or' transformation in DAGCombiner, which in turn helps in the selection of the rotate instruction.

@AlexMaclean AlexMaclean force-pushed the dev/amaclean/upstream-rotate-add-sdag branch from 31bc049 to e128108 Compare February 4, 2025 17:39
@AlexMaclean
Copy link
Member Author

Thanks @AlexMaclean. We had a situation downstream which needed this transformation at SelectionDAG level too. I just wanted to mention that I tried following, similar to your change at IR level in 59ced72bc211f (using the pattern matching framework at SD level).

static bool haveNoCommonBitsSetCommutative(SDValue A, SDValue B) {
  ...
+  SDValue V;
+  APInt R;
+    if (((sd_match(B, m_Shl(m_Value(), m_Sub(m_ConstInt(R), m_Value(V)))) &&
+          sd_match(A, m_Srl(m_Value(), m_Specific(V)))) ||
+         (sd_match(B, m_Srl(m_Value(), m_Sub(m_ConstInt(R), m_Value(V)))) &&
+          sd_match(A, m_Shl(m_Value(), m_Specific(V))))) &&
+        R.uge(A.getValueType().getScalarSizeInBits()))
+      return true;
      
  return false;
}

This then triggers the 'add' to 'or' transformation in DAGCombiner, which in turn helps in the selection of the rotate instruction.

Nice, the SDAG matching is new to me and does seem quite useful. However, I think at the SelectionDAG level it makes more sense to fold 'add' to rotate directly, reusing the existing rotate matching code which looks for many different rotate idioms beyond just this simple case (ex. https://github.com/llvm/llvm-project/pull/125612/files#diff-7cb0d03bd314924f121a96a010f963383fe24079e25c71399b68ba979e8453abR140).

@AlexMaclean
Copy link
Member Author

@topperc / @RKSimon ping for review

@AlexMaclean
Copy link
Member Author

@topperc @RKSimon @arsenm Ping for review, or for any thoughts on who else might be able to take a look at this change?

@topperc
Copy link
Collaborator

topperc commented Feb 18, 2025

Can we update comments to show the add cases too? For example,

  // fold (or (shl x, (*ext y)),                                                 
  //          (srl x, (*ext (sub 32, y)))) ->                                    
  //   (rotl x, y) or (rotr x, (sub 32, y))

@AlexMaclean
Copy link
Member Author

Can we update comments to show the add cases too? For example,

  // fold (or (shl x, (*ext y)),                                                 
  //          (srl x, (*ext (sub 32, y)))) ->                                    
  //   (rotl x, y) or (rotr x, (sub 32, y))

I've updated all comments to reflect the add case as well.

@AlexMaclean AlexMaclean requested a review from Artem-B February 26, 2025 16:50
@AlexMaclean
Copy link
Member Author

@topperc @RKSimon @arsenm Ping for review

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 27, 2025

please can we get some alive2 coverage for these? I'd still be a lot more comfortable with getting the new tests committed with existing codegen and see how far the haveNoCommonBitsSetCommutative improvements would get us tbh.

@AlexMaclean
Copy link
Member Author

please can we get some alive2 coverage for these? I'd still be a lot more comfortable with getting the new tests committed with existing codegen and see how far the haveNoCommonBitsSetCommutative improvements would get us tbh.

Here are alive2 proofs for the cases this PR adds: https://alive2.llvm.org/ce/z/yCPZtE

Adding the support in haveNoCommonBitsSetCommutative seems like it will require essentially reproducing the logic in MatchRotate. Even with the match syntax this seems like it will be pretty significant if we want to cover all the cases. Is there anything you've found that is incorrect in the proposed implementation to handle ADD nodes directly?

@AlexMaclean AlexMaclean force-pushed the dev/amaclean/upstream-rotate-add-sdag branch from 982d35b to 7d01ea5 Compare March 7, 2025 19:05
@AlexMaclean
Copy link
Member Author

please can we get some alive2 coverage for these? I'd still be a lot more comfortable with getting the new tests committed with existing codegen and see how far the haveNoCommonBitsSetCommutative improvements would get us tbh.

Here are alive2 proofs for the cases this PR adds: https://alive2.llvm.org/ce/z/yCPZtE

Adding the support in haveNoCommonBitsSetCommutative seems like it will require essentially reproducing the logic in MatchRotate. Even with the match syntax this seems like it will be pretty significant if we want to cover all the cases. Is there anything you've found that is incorrect in the proposed implementation to handle ADD nodes directly?

@RKSimon ping on this?

@@ -8379,7 +8383,7 @@ static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,
// NOTE: We can only do this when matching operations which won't modify the
// least Log2(EltSize) significant bits and not a general funnel shift.
unsigned MaskLoBits = 0;
if (IsRotate && isPowerOf2_64(EltSize)) {
if (IsRotate && !FromAdd && isPowerOf2_64(EltSize)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it incorrect to do this for add? Or do we just not want to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is incorrect to do this for an ADD or a non-rotate funnel-shift because when the shift amount is 0 the LHS and RHS will be combined with each other. If they are equal (IsRotate) and the combining operation is an OR (!FromAdd) than this will be a no-op (LHS = RHS = Result) which is consistent with a funnel-shift of 0. If either of these conditions are not met the then they cannot be safely combined like this for the 0 case.

@@ -8340,7 +8344,7 @@ static SDValue extractShiftForRotate(SelectionDAG &DAG, SDValue OppShift,
// The IsRotate flag should be set when the LHS of both shifts is the same.
// Otherwise if matching a general funnel shift, it should be clear.
static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,
static bool matchRotateSubAdd(SDValue Pos, SDValue Neg, unsigned EltSize,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this rename makes sense. The Sub here reflects how the shift amounts are related to each-other not something about the combining operation. The fact that we can now handle ADD cases in this function doesn't mean we should have an Add suffix on the function, just as the fact that we already handle OR cases doesn't mean we should have an Or suffix.

@AlexMaclean
Copy link
Member Author

Hi All,

If any of the reviewers have some spare time I'd really appreciate another round of review on this change. I've addressed all the specific stylistic issues raised.

It seems like the main hesitation around this change might be that it can be a little hard to tell if all the rotate folds are going to be correct for an ADD node. I'm guessing this is what @RKSimon's comment (#125612 (comment)) is getting at. However, I think more trivially correct solutions are a bit less efficient and will require lots of logic duplication to match the full set of transforms this approach allows. To try to address this concern, I've written up a set of alive2 proofs that demonstrate the transforms that could occur (https://alive2.llvm.org/ce/z/yCPZtE). Further, for what it's worth, this change has been present in the internal fork of llvm used by NVIDIA for a fairly long time and we haven't noticed any correctness issues. If there are edge cases I'm missing though, I'll be happy to revert the change ASAP. Are there any further objections on these grounds? Is there anything I can do to assuage them?

Beyond this, I don't think there are any open issues or blockers for this change.

Also, if there is anyone else anyone can think who should take a look at this change please add them or let me know!

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in principle, but please wait if someone else wants to chime in.

If there's no objection, I'd suggest landing the patch early next week and watch the bots in case something unexpected pops up.

I'm not sure if there are existing tests covering add/shift on other platforms. Given that there are no changes in existing tests that code path may not have enough coverage. It may be useful to add the tests for NVPTX and some common targets first, so we can observe the effects of this patch. It's much easier to reason about the impact of the patch when the test diffs show what changed, as opposed to the new state of affairs only.

@AlexMaclean
Copy link
Member Author

LGTM in principle, but please wait if someone else wants to chime in.

If there's no objection, I'd suggest landing the patch early next week and watch the bots in case something unexpected pops up.

I'm not sure if there are existing tests covering add/shift on other platforms. Given that there are no changes in existing tests that code path may not have enough coverage. It may be useful to add the tests for NVPTX and some common targets first, so we can observe the effects of this patch. It's much easier to reason about the impact of the patch when the test diffs show what changed, as opposed to the new state of affairs only.

I've per-committed the tests within this PR so c9d6c84 shows only the diff associated with the code change: c9d6c84#diff-7cb0d03bd314924f121a96a010f963383fe24079e25c71399b68ba979e8453ab. Are you asking that I land the pre-commit on main as a separate commit? Happy to do so if this would help. I can also add this same IR with a different triple to demonstrate the impacts on other targets if there particular back-ends people would like to see coverage of?

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I am still worried that we're possibly missing a corner case with the ADD nodes, but I can't see anything. I'd have still preferred we tried haveNoCommonBitsSet first tbh as although there's duplication, it's a lot more self contained.

@Artem-B
Copy link
Member

Artem-B commented Mar 21, 2025

Are you asking that I land the pre-commit on main as a separate commit?

Not quite. Or, rather, that, and a bit more. We're concerned about corner cases that may be exposed by the change, and could benefit for wider testing. For testing to be useful we either need confidence that the tests themselves are correct, and it helps to have an established known good baseline. For this particular mix of instructions we appear to have no tests at the moment, but we do know that the current code in LLVM has been around for a while and is likely correct.

What I'm proposing is to create a set of tests capturing the status quo, across multiple "important" architectures. I would target x86, arm, amdgpu and nvptx.

Then land your patch, and see what changes in those baseline tests and verify that the changes do work and make sense.

It's not, strictly speaking, necessary to do it that way, but it's a relatively easy way to capture potential issues early.

@AlexMaclean
Copy link
Member Author

Are you asking that I land the pre-commit on main as a separate commit?

Not quite. Or, rather, that, and a bit more. We're concerned about corner cases that may be exposed by the change, and could benefit for wider testing. For testing to be useful we either need confidence that the tests themselves are correct, and it helps to have an established known good baseline. For this particular mix of instructions we appear to have no tests at the moment, but we do know that the current code in LLVM has been around for a while and is likely correct.

What I'm proposing is to create a set of tests capturing the status quo, across multiple "important" architectures. I would target x86, arm, amdgpu and nvptx.

Then land your patch, and see what changes in those baseline tests and verify that the changes do work and make sense.

It's not, strictly speaking, necessary to do it that way, but it's a relatively easy way to capture potential issues early.

Sounds good, I've opened #132842 to cover the backends you've suggested, please let me know what you think! I've also rebased this PR atop that branch so you can see the impact this change would have across those targets.

@Artem-B
Copy link
Member

Artem-B commented Mar 24, 2025

I've also rebased this PR atop that branch so you can see the impact this change would have across those targets.

For some reason I see only complete test files added. It looks like rebase didn't make it.

@AlexMaclean
Copy link
Member Author

I've also rebased this PR atop that branch so you can see the impact this change would have across those targets.

For some reason I see only complete test files added. It looks like rebase didn't make it.

The diff on the tests should be viewable within this commit 0f8aef9. Does that work? Once I land #132842 the diff will be apparent in the main PR diff.

@Artem-B
Copy link
Member

Artem-B commented Mar 25, 2025

The diff on the tests should be viewable within this commit 0f8aef9. Does that work?

Yes, I can see test diffs now. That looks like a very nice improvement across all tested platforms.

AlexMaclean added a commit that referenced this pull request Mar 26, 2025
Add tests to various targets covering rotate idioms where an 'ADD' node
is used to combine the halves instead of an 'OR'. Some of these cases
will be better optimized following #125612, while others are already
well optimized or do not have a valid fold to a rotate or funnel-shift.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants