Skip to content

[DAGCombiner] Option --combiner-select-seq #134813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ppetrovic98
Copy link

@ppetrovic98 ppetrovic98 commented Apr 8, 2025

This PR reintroduces the patch initially proposed by Anmol Paralkar (@anmolparalkar-nxp ) on Phabricator as D136047. The original patch aimed to minimize the condition-code lifetime for select sequences where constants form an arithmetic progression, improving performance by reducing the number of comparisons.

Modifications:

Updated the patch to work with the latest LLVM version.
Adapted the test case (combiner-select-seq.ll) to match the current output format from the LLVM main branch.
Tested the changes.

Reopening the discussion:

The original patch received a comment from @RKSimon suggesting that the transformation might be more suitable in CodeGenPrepare rather than DAGCombiner. Since the discussion was not concluded, I would like to reopen the topic for community input.

Credits:

Original author: Anmol P. Paralkar (@anmolparalkar-nxp )
Link to original Phabricator discussion: https://reviews.llvm.org/D136047
Link to original problem (bug): https://bugs.llvm.org/show_bug.cgi?id=51147

Copy link

github-actions bot commented Apr 8, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Apr 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: None (ppetrovic98)

Changes

This PR reintroduces the patch initially proposed by Anmol Paralkar (@anmolparalkar-nxp ) on Phabricator as D136047. The original patch aimed to minimize the condition-code lifetime for select sequences where constants form an arithmetic progression, improving performance by reducing the number of comparisons.

Modifications:

Updated the patch to work with the latest LLVM version.
Adapted the test case (combiner-select-seq.ll) to match the current output format from the LLVM main branch.
Tested the changes.

Reopening the discussion:

The original patch received a comment from @RKSimon suggesting that the transformation might be more suitable in CodeGenPrepare rather than DAGCombiner. Since the discussion was not concluded, I would like to reopen the topic for community input.

Credits:

Original author: Anmol P. Paralkar (@anmolparalkar-nxp )
Updates and modifications: Petar I. Petrovic (@ppetrovic98 )
Link to original Phabricator discussion: https://reviews.llvm.org/D136047
Link to original problem (bug): https://bugs.llvm.org/show_bug.cgi?id=51147


Patch is 37.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134813.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+575)
  • (added) llvm/test/CodeGen/AArch64/combiner-select-seq.ll (+224)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 38376de5783ae..6a21579995fd3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -149,6 +149,18 @@ static cl::opt<bool> EnableShrinkLoadReplaceStoreWithStore(
     cl::desc("DAG combiner enable load/<replace bytes>/store with "
              "a narrower store"));
 
+static cl::opt<bool> SelectSeq(
+    "combiner-select-seq", cl::Hidden, cl::init(false),
+    cl::desc("DAG combiner minimize condition-code lifetime for select "
+             "sequences over constants in arithmetic progressions"),
+    cl::ZeroOrMore);
+
+static cl::opt<unsigned> SelectSeqMinCostBenefit(
+    "combiner-select-seq-min-cost-benefit", cl::Hidden, cl::init(1),
+    cl::desc("Transform only when the cost benefit (instruction count to be "
+             "reduced by) is at-least as specified; default: 1)"),
+    cl::ZeroOrMore);
+
 namespace {
 
   class DAGCombiner {
@@ -871,6 +883,10 @@ namespace {
     void ExtendSetCCUses(const SmallVectorImpl<SDNode *> &SetCCs,
                          SDValue OrigLoad, SDValue ExtLoad,
                          ISD::NodeType ExtType);
+
+    /// Rewrite sequences of selects minimizing dependency on the condition code
+    /// register when possible.
+    bool SelectSeqMinCCLifetime(void);
   };
 
 /// This class is a DAGUpdateListener that removes any deleted
@@ -1725,11 +1741,570 @@ bool DAGCombiner::recursivelyDeleteUnusedNodes(SDNode *N) {
   return true;
 }
 
+////////////////////////////////////////////////////////////////////////////////
+//
+// --combiner-select-seq:
+// ======================
+//
+// Summary:
+// --------
+//
+// We deal with the special case of a sequence of select instructions that use
+// the condition code register to decide between corresponding elements of two
+// sequences of constants that are each, in arithmetic progression. Typically,
+// the arguments to the selects would form a sequence of accesses to fields of
+// a struct or the elements of an array. Instead of having the CC tied up across
+// the expanse of the sequence for the pointwise selection of the constants, we
+// transform the code to instead, decide upfront between the defining constructs
+// of the two sequences of constants and generating the selected sequence
+// elements via a corresponding sequence of add instructions.
+//
+// Introduction:
+// -------------
+//
+// Consider a sequence of selects "S", consuming a setcc such as:
+//
+//   cond = setcc ...
+//     ...
+//   S[0]: reg[0] = select cond t[0] f[0]
+//     ...
+//   S[n-1]: reg[n-1] = select cond t[n-1] f[n-1]
+//     ...
+//
+// Two sequences arise from the operands of each select:
+//
+//   t[0], t[1], ..., t[n-1]
+//   f[0], f[1], ..., f[n-1]
+//
+// Denote either sequence by (say) "X". (PS: Read ':=' as "is of the form")
+//
+// If,
+//
+//   X[i] := offset[i] (forall i : i <- [0, n))
+//
+// or,
+//
+//   X[i] := reg[i] (forall i : i <- [0, n))
+//   where:
+//     reg[0] := base-register
+//            |  add base-register, offset[0]
+//     reg[i] := add base-register, offset[i] (forall i : i <- (0, n))
+//     where:
+//       base-register := BaseRegDef <>, Register:(i32|i64) %<>
+//
+// where:
+//   offset[] := Constant:i64<>
+//            |  Constant:i32<>
+//   BaseRegDef := CopyFromReg | Load
+//
+// Define:
+//
+//   Offset(X)[0] = Constant:(<i32>|<i64>)<0>
+//     if X[0] := (reg[0] := base-register)
+//   Offset(X)[i] = offset[i]
+//     if X[i] := offset[i] (forall i : i <- [0, n))
+//   Offset(X)[i] = offset[i]
+//     if X[i] := add base-register, offset[i] (forall i : i <- [0, n))
+//
+// Now, (for: n > 1) if Offset(X) is an arithmetic progression, i.e:
+//
+//   Offset(X)[i] := InitialValue(X) (if: i == 0)
+//                |  Offset(X)[i-1] + Delta(X) (forall i : i <- (0, n))
+//     where:
+//       InitialValue(X) = k (k is an arbitrary integer constant)
+//       Delta(X) = (Offset(X)[1] - Offset(X)[0])
+//
+//   Further define:
+//       BaseReg(X) = Constant:(<i32>|<i64>)<0>
+//         if X[0] := offset[0]
+//       BaseReg(X) = base-register
+//         if X[0] := base-register
+//                 |  add base-register, offset[0]
+//
+// PS: In practice, we also accept the reverse-form:
+//
+//   reg[n-1] := base-register
+//            |  add base-register, offset[n-1]
+//   reg[i] := add base-register, offset[i] (forall i : i <- [0, n))
+//   where:
+//     base-register := BaseRegDef <>, Register:(i32<>|i64) %<>
+//
+//   However, the essential idea remains as above.
+//
+// Proposition:
+// ------------
+//
+// Then, the sequence of selects "S", can be rewritten as sequence "S'", where
+// a choice is made between the two sequences via selects on the base-register,
+// the initial-value and the constant-difference (delta) between the successive
+// elements - i.e. the constructs that define such a sequence, and then these
+// are used to generate each access via a sequence of adds:
+//
+//   cond = setcc ...
+//   # Insert: 0
+//   BaseReg(S) = select cond BaseReg(t) BaseReg(f)
+//   # Insert: 1
+//   InitialValue(S) = select cond InitialValue(t) InitialValue(f)
+//   # Insert: 2
+//   Delta(S) = select cond Delta(t) Delta(f)
+//     ...
+//   # Rewrite: 0
+//   S'[0]: reg[0] = add BaseReg(S) InitialValue(S)
+//     ...
+//   # Rewrite: i
+//   S'[i]: reg[i] = add reg[i-1] Delta(S)
+//     ...
+//   # Rewrite: n-1
+//   S'[n-1]: reg[n-1] = add reg[n-2] Delta(S)
+//     ...
+//
+// Conclusion:
+// -----------
+//
+// The above code transformation has two effects:
+//
+// a. Minimization of the setcc lifetime.
+//
+//    The Rewrites: [0, n) do not have a dependency on the setcc. This is the
+//    primary motivation for performing this transformation; see:
+//
+//      [AArch64] Extremely slow code generation for series of function
+//                calls/addition #50491
+//      a.k.a: https://bugs.llvm.org/show_bug.cgi?id=51147
+//
+//    As the length of "S" grows, the lifetime of the setcc grows, creating an
+//    interference with all the other computations in the DAG, thus causing
+//    long build times. Ths transformation helps keep the setcc lifetime minimal
+//    in this case.
+//
+// b. Code size reduction.
+//
+//    Also, while we have (upto) three new selects (Inserts [0-2]), we
+//    eliminate one select and (upto) two adds with each rewrite [0, n) (which
+//    brings in an add) therefore, reducing overall code size, and potentially
+//    improving the runtime performance of the code.
+//
+//      NewCodeSize = OldCodeSize - (3 * n) + (3 + n) (Best case).
+//
+// Notes:
+// ------
+//
+// As a future extension, extend the transformation to include sequences not
+// in arithmetic progression by creating two lookup tables for storing the
+// constant offsets for the [tf]-sequences, and selecting the appropriate
+// table once, based on the setcc value. Then, the offset to add to the base
+// register can be looked up in its entry in the appropriate table; the idea
+// being similar to the scheme above.
+//
+////////////////////////////////////////////////////////////////////////////////
+bool DAGCombiner::SelectSeqMinCCLifetime(void) {
+
+  LLVM_DEBUG(dbgs() << "SelectSeqMinCCLifetime:\n");
+
+  LLVM_DEBUG(dbgs() << "DAG PRE:\n");
+  for (SDNode &Node : DAG.allnodes())
+    LLVM_DEBUG(dbgs() << ""; Node.dump());
+
+  // Run through the DAG, looking for selects that use setcc,
+  // collect the setcc operand if we have not collected it already:
+  SmallSet<SDValue, 16> CandidateSetcc;
+  for (SDNode &Node : DAG.allnodes()) {
+    if (Node.getOpcode() != ISD::SELECT)
+      continue;
+    SDValue Op0 = Node.getOperand(0);
+    if (Op0.getOpcode() != ISD::SETCC)
+      continue;
+    if (!CandidateSetcc.contains(Op0))
+      CandidateSetcc.insert(Op0);
+  }
+
+  auto ProcessSetcc = [this](SDValue N) -> bool {
+    bool DAGModified = false;
+    assert(N.getOpcode() == ISD::SETCC);
+    LLVM_DEBUG(dbgs() << "Setcc: "; N.dump());
+    // SelectUser: All the select instructions that use this setcc.
+    // SelectUser lists the selects in the order they appear in the DAG i.e.
+    // going from top to down in the DAG, they appear in left to right order, 0
+    // onwards ... Ultimately, the accesses are generated using SelectUser.
+    SmallVector<SDNode *, 64> SelectUser;
+    SmallVector<SDValue, 64> TSeq; // Operand 1 of each SelectUser
+    SmallVector<SDValue, 64> FSeq; // Operand 2 of each SelectUser
+    SDNodeFlags AccessAddrFlags;
+    unsigned i = 0;
+    // Collect the 't' & 'f' operands:
+    for (SDNode *User : N->users()) {
+      // NOTE: The SDNode::use_iterator presents the use's in *reverse* order.
+      ++i;
+      if (User->getOpcode() != ISD::SELECT)
+        continue;
+      SDValue N1 = User->getOperand(1);
+      SDValue N2 = User->getOperand(2);
+      auto itU = SelectUser.begin();
+      SelectUser.insert(itU, User);
+      auto itO1 = TSeq.begin();
+      TSeq.insert(itO1, N1);
+      auto itO2 = FSeq.begin();
+      FSeq.insert(itO2, N2);
+      SDNodeFlags N1Flags = N1->getFlags();
+      SDNodeFlags N2Flags = N2->getFlags();
+      // Note the flags for later use ...
+      if (N1.getOpcode() == ISD::ADD)
+        AccessAddrFlags = N1Flags;
+      if (N2.getOpcode() == ISD::ADD)
+        AccessAddrFlags = N2Flags;
+      if (N1.getOpcode() != ISD::ADD || N2.getOpcode() != ISD::ADD)
+        continue;
+      // Both operands are ISD::ADD's Ascertain that either: Both operands have
+      // the same (NoSigned/NoUnsigned)Wrap flags or both operands have no
+      // (NoSigned/NoUnsigned)Wrap flags:
+      if (!((N1Flags.hasNoSignedWrap() && N2Flags.hasNoSignedWrap()) ||
+            (N1Flags.hasNoUnsignedWrap() && N2Flags.hasNoUnsignedWrap()) ||
+            ((!N1Flags.hasNoSignedWrap() && !N2Flags.hasNoSignedWrap()) &&
+             (!N1Flags.hasNoUnsignedWrap() && !N2Flags.hasNoUnsignedWrap())))) {
+        LLVM_DEBUG(
+            dbgs() << "Operand (Signed/Unsigned)flags mismatch; skip.\n");
+        return false;
+      }
+    }
+    LLVM_DEBUG(dbgs() << "User:\n");
+    for (unsigned i = 0; i < SelectUser.size(); ++i) {
+      LLVM_DEBUG(dbgs() << i << " "; SelectUser[i]->dump());
+      for (const SDValue &Op : SelectUser[i]->op_values())
+        LLVM_DEBUG(dbgs() << "  "; Op.dump());
+    }
+    auto WellFormedAP = [this](const SmallVector<SDValue, 64> &OpSeq,
+                               SDValue &BaseReg, int64_t &InitialVal,
+                               int64_t &Delta, uint64_t &ADDCount) -> bool {
+      auto AscertainConstDiff = [](const SmallVector<int64_t, 64> &RawOffsets,
+                                   int64_t diff) -> bool {
+        bool WF = true;
+        for (unsigned i = 1; WF && (i < RawOffsets.size()); ++i)
+          WF &= (diff == (RawOffsets[i] - RawOffsets[i - 1]));
+        return WF;
+      };
+      if (OpSeq.size() < 2) {
+        LLVM_DEBUG(dbgs() << "Need at least two elements in an arithmetic "
+                             "progression; skip.\n");
+        return false;
+      }
+      bool WF = true;
+      SmallVector<int64_t, 64> RawOffsets;
+      if (OpSeq[0].getOpcode() == ISD::Constant) {
+        for (auto V : OpSeq) {
+          WF &= ((V.getOpcode() == ISD::Constant));
+          if (WF) {
+            auto *C = dyn_cast<ConstantSDNode>(V);
+            if (C && C->getAPIntValue().getSignificantBits() <= 64)
+              RawOffsets.push_back(C->getSExtValue());
+            else {
+              LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+              return false;
+            }
+          } else {
+            LLVM_DEBUG(dbgs()
+                       << "If the zeroeth element is a constant, so must be "
+                          "each element in the sequence; skip.\n");
+            return false;
+          }
+        }
+        BaseReg = DAG.getConstant(0, SDLoc(OpSeq[0]), OpSeq[0].getValueType());
+        auto *C0 = dyn_cast<ConstantSDNode>(OpSeq[0]);
+        auto *C1 = dyn_cast<ConstantSDNode>(OpSeq[1]);
+        if ((C0 && C0->getAPIntValue().getSignificantBits() <= 64) &&
+            (C1 && C1->getAPIntValue().getSignificantBits() <= 64)) {
+          InitialVal = C0->getSExtValue();
+          Delta = (C1->getSExtValue()) - InitialVal;
+          WF &= AscertainConstDiff(RawOffsets, Delta);
+          return WF;
+        } else {
+          LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+          return false;
+        }
+      } else {
+        // Three cases arise:
+        // 0. All the elements are ISD::ADD's
+        // 1. The zeroeth element is a base register definition and the rest are
+        //    ISD::ADD's
+        // 2. The OpSeq.size()-1'th element is a base register definition and
+        //    the rest are ISD::ADD's
+        // l: inclusive lower bound
+        // u: non-inclusive upper bound
+        auto isBaseRegDef = [](const SDValue V) -> bool {
+          return (V.getOpcode() == ISD::CopyFromReg) ||
+                 (V.getOpcode() == ISD::LOAD);
+        };
+        unsigned l, u;
+        if (OpSeq[0].getOpcode() == ISD::ADD &&
+            OpSeq[OpSeq.size() - 1].getOpcode() == ISD::ADD) {
+          l = 0;
+          u = OpSeq.size();
+          if (isBaseRegDef(OpSeq[0].getOperand(0))) {
+            BaseReg = OpSeq[0].getOperand(0);
+          } else {
+            LLVM_DEBUG(dbgs() << "Unable to get BaseReg; skip.\n");
+            return false;
+          }
+        } else if (isBaseRegDef(OpSeq[0])) {
+          l = 1;
+          u = OpSeq.size();
+          BaseReg = OpSeq[0];
+        } else if (isBaseRegDef(OpSeq[OpSeq.size() - 1])) {
+          l = 0;
+          u = OpSeq.size() - 1;
+          BaseReg = OpSeq[OpSeq.size() - 1];
+        } else {
+          LLVM_DEBUG(
+              dbgs()
+              << "Sequence not in (Add)+|(BaseRegDef)(Add)+|(Add)+(BaseRegDef) "
+                 "form; skip.\n");
+          return false;
+        }
+        // Ascertain that the elements in OpSeq[l, u) are all ISD::ADD's in the
+        // form: BaseReg + (constant offset)
+        for (unsigned i = l; WF && (i < u); ++i) {
+          WF &= ((OpSeq[i].getOpcode() == ISD::ADD) &&
+                 (OpSeq[i].getOperand(0) == BaseReg) &&
+                 (OpSeq[i].getOperand(1).getOpcode() == ISD::Constant));
+          if (WF) {
+            ++ADDCount;
+            auto *C = dyn_cast<ConstantSDNode>(OpSeq[i].getOperand(1));
+            if (C && C->getAPIntValue().getSignificantBits() <= 64)
+              RawOffsets.push_back(C->getSExtValue());
+            else {
+              LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+              return false;
+            }
+          } else {
+            LLVM_DEBUG(dbgs() << "Sequence not in (Add = BaseReg + <constant "
+                                 "offset>)+ form; skip.\n");
+            return false;
+          }
+        }
+        if (!WF)
+          return WF;
+        if (isBaseRegDef(OpSeq[0])) {
+          InitialVal = 0;
+          // Element 1 is guaranteed to be ISD::ADD in the form:
+          // BaseReg + (constant offset)
+          auto *C = dyn_cast<ConstantSDNode>(OpSeq[1].getOperand(1));
+          if (C && C->getAPIntValue().getSignificantBits() <= 64)
+            Delta = (C->getSExtValue()) - InitialVal;
+          else {
+            LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+            return false;
+          }
+        } else { // Element 0 is an ISD::ADD
+          auto *C = dyn_cast<ConstantSDNode>(OpSeq[0].getOperand(1));
+          if (C && C->getAPIntValue().getSignificantBits() <= 64) {
+            InitialVal = C->getSExtValue();
+            if (OpSeq[1].getOpcode() == ISD::ADD) {
+              auto *C = dyn_cast<ConstantSDNode>(OpSeq[1].getOperand(1));
+              if (C && C->getAPIntValue().getSignificantBits() <= 64) {
+                Delta = (C->getSExtValue()) - InitialVal;
+              } else {
+                LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+                return false;
+              }
+            } else { // Element 1 is an BaseRegDef
+              Delta = 0 - InitialVal;
+            }
+          } else {
+            LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+            return false;
+          }
+        }
+        WF &= AscertainConstDiff(RawOffsets, Delta);
+        return WF;
+      }
+    };
+    uint64_t TSeqADDCount = 0;
+    SDValue TSeqBaseReg;
+    int64_t TSeqInitialVal = 0;
+    int64_t TSeqDelta = 0;
+    bool TSeqValid = WellFormedAP(TSeq, TSeqBaseReg, TSeqInitialVal, TSeqDelta,
+                                  TSeqADDCount);
+    uint64_t FSeqADDCount = 0;
+    SDValue FSeqBaseReg;
+    int64_t FSeqInitialVal = 0;
+    int64_t FSeqDelta = 0;
+    bool FSeqValid = WellFormedAP(FSeq, FSeqBaseReg, FSeqInitialVal, FSeqDelta,
+                                  FSeqADDCount);
+    if (!TSeqValid || !FSeqValid) {
+      LLVM_DEBUG(dbgs() << "Operands not well formed or not in aritmetic "
+                           "progression; skip.\n");
+      return false;
+    }
+    EVT TSeqEltVT = TSeqBaseReg.getValueType();
+    EVT FSeqEltVT = FSeqBaseReg.getValueType();
+    EVT TSeqEltSTVT = TSeqEltVT.getScalarType();
+    EVT FSeqEltSTVT = FSeqEltVT.getScalarType();
+    SDValue TSeqInitialValReg = DAG.getConstant(
+        APInt(TSeqEltSTVT.getSizeInBits(), (uint64_t)TSeqInitialVal, true),
+        SDLoc(N), TSeqEltVT);
+    SDValue TSeqDeltaReg = DAG.getConstant(
+        APInt(TSeqEltSTVT.getSizeInBits(), (uint64_t)TSeqDelta, true), SDLoc(N),
+        TSeqEltVT);
+    SDValue FSeqInitialValReg = DAG.getConstant(
+        APInt(FSeqEltSTVT.getSizeInBits(), (uint64_t)FSeqInitialVal, true),
+        SDLoc(N), FSeqEltVT);
+    SDValue FSeqDeltaReg = DAG.getConstant(
+        APInt(FSeqEltSTVT.getSizeInBits(), (uint64_t)FSeqDelta, true), SDLoc(N),
+        FSeqEltVT);
+    // Sequence selector nodes:
+    // Construct a select for the base register:
+    SDValue BaseRegSelect = DAG.getSelect(
+        SDLoc(N),
+        TSeqBaseReg.getValueType(), // Note: We could use FSeqBaseReg as well.
+        N, TSeqBaseReg, FSeqBaseReg);
+    LLVM_DEBUG(dbgs() << "BaseReg: "; BaseRegSelect->dump());
+    // Construct a select for the initial value:
+    SDValue InitialValSelect = DAG.getSelect(
+        SDLoc(N),
+        TSeqInitialValReg
+            .getValueType(), // Note: We could use FSeqInitialValReg as well.
+        N, TSeqInitialValReg, FSeqInitialValReg);
+    LLVM_DEBUG(dbgs() << "InitialVal: "; InitialValSelect->dump());
+    // Construct a select for the delta value:
+    SDValue DeltaSelect =
+        DAG.getSelect(SDLoc(N), TSeqDeltaReg.getValueType(),
+                      // Note: We could use FSeqDeltaReg as well.
+                      N, TSeqDeltaReg, FSeqDeltaReg);
+    LLVM_DEBUG(dbgs() << "Delta: "; DeltaSelect->dump());
+    // Check if any of the sequence selector nodes correspond to the SelectUser
+    // nodes. If so, we need to bail out here itself as we will end-up creating
+    // a cycle in the generated access address nodes, after we substitute the
+    // the generated access addresss in the uses of the select nodes that we are
+    // eliminating. The exception to this is when the BaseReg is SelectUser[0]
+    // and the InitialValue is the constant 0; in this case, AccessAddr[0] will
+    // reduce to the BaseReg, so this is not an issue.
+    auto Overlap = [](const SmallVector<SDNode *, 64> NodeList,
+                      const SDNode *Node) -> unsigned {
+      unsigned Index = 0;
+      for (auto N : NodeList)
+        if (N == Node)
+          return Index;
+        else
+          ++Index;
+      return Index;
+    };
+    auto isZero = [](SDValue V) -> bool {
+      if (V.getOpcode() == ISD::Constant) {
+        auto *C = dyn_cast<ConstantSDNode>(V);
+        if (C && C->getAPIntValue().getSignificantBits() <= 64)
+          return C->getSExtValue() == 0;
+        else {
+          LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+          return false;
+        }
+      }
+      return false;
+    };
+    unsigned SUSize = SelectUser.size();
+    unsigned BRIndex = Overlap(SelectUser, BaseRegSelect.getNode());
+    if ((((BRIndex != 0) || !isZero(InitialValSelect)) &&
+         (BRIndex != SUSize)) ||
+        (...
[truncated]

@ppetrovic98
Copy link
Author

Adding the same reviewers as on the original Phabricator discussion, apologies if I included the wrong user:
@anmolparalkar-nxp
@RKSimon
@NicolaLancellotti
@peterwaller-arm
@david-arm
@c-rhodes
@paulwalker-arm
@sdesmalen-arm
@bogner
@fhahn

@ppetrovic98
Copy link
Author

Friendly ping (again, apologies if I included a wrong user as a reviewer):
@anmolparalkar-nxp
@RKSimon
@NicolaLancellotti
@peterwaller-arm
@david-arm
@c-rhodes
@paulwalker-arm
@sdesmalen-arm
@bogner
@fhahn

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Apr 10, 2025

I was going to say no need to @ given that anyone on review will see a normal comment too, but of course they aren't. You probably don't have the permissions to add reviewers yet, so I've done so for you.

Please remove yourself from review if you're no longer interested.

@ppetrovic98
Copy link
Author

I was going to say no need to @ given that anyone on review will see a normal comment too, but of course they aren't. You probably don't have the permissions to add reviewers yet, so I've done so for you.

Please remove yourself from review if you're no longer interested.

Thank you, David!
And yes, I do not have the permissions to add reviewers.

@anmolparalkar-nxp
Copy link
Contributor

@ppetrovic98 just getting in synch, please bear with me; let me do a test drive and write back. Thanks.

@anmolparalkar-nxp
Copy link
Contributor

@ppetrovic98 In combiner-select-seq.ll, the '-stop-after=amdgpu-isel' crashes llc:
LLVM ERROR: "amdgpu-isel" pass is not registered.
That would change to: -stop-after=aarch64-isel
Here is the re-spin: combiner-select-seq.txt
PS: It would not accept the .ll format (hence used .txt); kindly rename, thanks. This test now passes. Please feel free to incorporate the respun combiner-select-seq.ll into your patch, thanks.
I also verified that the llvm-lit run has no regressions.
My llvm sources are at today's f1dd6b3 and configured aarch64-none-elf
Thank you for reviving this work, lgtm.

@ppetrovic98 ppetrovic98 force-pushed the combiner-select-seq branch from c4ca151 to 0fb3d51 Compare April 16, 2025 13:30
@ppetrovic98
Copy link
Author

@anmolparalkar-nxp I updated the test and rebased onto latest LLVM main. Thanks for the feedback!

Copy link
Contributor

@anmolparalkar-nxp anmolparalkar-nxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall.


; PRE-LABEL: name: sum
; PRE: %0:gpr64common = COPY $x0
; PRE: %1:gpr64common = nuw ADDXri %0, 56, 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and following, change to: PRE-NEXT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, do you mean to change all the PRE/PST directives starting from this line or just change these two lines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From line 134 up to and including line 216, the PRE's should become PRE-NEXT's.


; PST-LABEL: name: sum
; PST: %0:gpr64common = COPY $x0
; PST: %1:gpr64 = SUBSXri %0, 0, 0, implicit-def $nzcv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and following, change to: PST-NEXT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From line 220 up to and including line 289, the PST's should become PST-NEXT's

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the testcase, thank you!

Copy link
Contributor

@anmolparalkar-nxp anmolparalkar-nxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I did not realize, I had missed the {PRE, PST}-NEXT's in the test; kindly add. LGTM, otherwise. Thanks.


; PST-LABEL: name: sum
; PST: %0:gpr64common = COPY $x0
; PST: %1:gpr64 = SUBSXri %0, 0, 0, implicit-def $nzcv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From line 220 up to and including line 289, the PST's should become PST-NEXT's

Minimize condition-code lifetime for select sequences over constants in arithmetic progressions.
It reduces the number of comparisons and shortens the condition code lifetime, improving performance.

Original patch by Anmol Paralkar (@anmolparalkar-nxp), initially proposed on Phabricator as D136047.
Modified and updated to work with the latest LLVM version. The test case has been updated to match the current output format.
@ppetrovic98 ppetrovic98 force-pushed the combiner-select-seq branch from 0fb3d51 to 8420bb7 Compare April 17, 2025 19:51
Comment on lines +152 to +156
static cl::opt<bool> SelectSeq(
"combiner-select-seq", cl::Hidden, cl::init(false),
cl::desc("DAG combiner minimize condition-code lifetime for select "
"sequences over constants in arithmetic progressions"),
cl::ZeroOrMore);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the kind of flags that will not be used, can you just do this unconditionally or based on some existing set of target preference hooks? e.g. hasMultipleConditionRegisters, shouldNormalizeToSelectSequence

CandidateSetcc.insert(Op0);
}

auto ProcessSetcc = [this](SDValue N) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested lambdas, move to proper utility functions?

auto ProcessSetcc = [this](SDValue N) -> bool {
bool DAGModified = false;
assert(N.getOpcode() == ISD::SETCC);
LLVM_DEBUG(dbgs() << "Setcc: "; N.dump());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operator << works on SDValues, don't need separate dump statement

Comment on lines +1924 to +1925
if (!CandidateSetcc.contains(Op0))
CandidateSetcc.insert(Op0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double map lookup

Comment on lines +1915 to +1918
// Run through the DAG, looking for selects that use setcc,
// collect the setcc operand if we have not collected it already:
SmallSet<SDValue, 16> CandidateSetcc;
for (SDNode &Node : DAG.allnodes()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should not be scanning through the entire DAG, ever really. If you need longer context maybe move to codegenprepare?

(V.getOpcode() == ISD::LOAD);
};
unsigned l, u;
if (OpSeq[0].getOpcode() == ISD::ADD &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isAddLike? Should catch the disjoint or case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants