-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[DAGCombiner] Option --combiner-select-seq #134813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-aarch64 Author: None (ppetrovic98) ChangesThis PR reintroduces the patch initially proposed by Anmol Paralkar (@anmolparalkar-nxp ) on Phabricator as D136047. The original patch aimed to minimize the condition-code lifetime for select sequences where constants form an arithmetic progression, improving performance by reducing the number of comparisons. Modifications:Updated the patch to work with the latest LLVM version. Reopening the discussion:The original patch received a comment from @RKSimon suggesting that the transformation might be more suitable in CodeGenPrepare rather than DAGCombiner. Since the discussion was not concluded, I would like to reopen the topic for community input. Credits:Original author: Anmol P. Paralkar (@anmolparalkar-nxp ) Patch is 37.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134813.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 38376de5783ae..6a21579995fd3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -149,6 +149,18 @@ static cl::opt<bool> EnableShrinkLoadReplaceStoreWithStore(
cl::desc("DAG combiner enable load/<replace bytes>/store with "
"a narrower store"));
+static cl::opt<bool> SelectSeq(
+ "combiner-select-seq", cl::Hidden, cl::init(false),
+ cl::desc("DAG combiner minimize condition-code lifetime for select "
+ "sequences over constants in arithmetic progressions"),
+ cl::ZeroOrMore);
+
+static cl::opt<unsigned> SelectSeqMinCostBenefit(
+ "combiner-select-seq-min-cost-benefit", cl::Hidden, cl::init(1),
+ cl::desc("Transform only when the cost benefit (instruction count to be "
+ "reduced by) is at-least as specified; default: 1)"),
+ cl::ZeroOrMore);
+
namespace {
class DAGCombiner {
@@ -871,6 +883,10 @@ namespace {
void ExtendSetCCUses(const SmallVectorImpl<SDNode *> &SetCCs,
SDValue OrigLoad, SDValue ExtLoad,
ISD::NodeType ExtType);
+
+ /// Rewrite sequences of selects minimizing dependency on the condition code
+ /// register when possible.
+ bool SelectSeqMinCCLifetime(void);
};
/// This class is a DAGUpdateListener that removes any deleted
@@ -1725,11 +1741,570 @@ bool DAGCombiner::recursivelyDeleteUnusedNodes(SDNode *N) {
return true;
}
+////////////////////////////////////////////////////////////////////////////////
+//
+// --combiner-select-seq:
+// ======================
+//
+// Summary:
+// --------
+//
+// We deal with the special case of a sequence of select instructions that use
+// the condition code register to decide between corresponding elements of two
+// sequences of constants that are each, in arithmetic progression. Typically,
+// the arguments to the selects would form a sequence of accesses to fields of
+// a struct or the elements of an array. Instead of having the CC tied up across
+// the expanse of the sequence for the pointwise selection of the constants, we
+// transform the code to instead, decide upfront between the defining constructs
+// of the two sequences of constants and generating the selected sequence
+// elements via a corresponding sequence of add instructions.
+//
+// Introduction:
+// -------------
+//
+// Consider a sequence of selects "S", consuming a setcc such as:
+//
+// cond = setcc ...
+// ...
+// S[0]: reg[0] = select cond t[0] f[0]
+// ...
+// S[n-1]: reg[n-1] = select cond t[n-1] f[n-1]
+// ...
+//
+// Two sequences arise from the operands of each select:
+//
+// t[0], t[1], ..., t[n-1]
+// f[0], f[1], ..., f[n-1]
+//
+// Denote either sequence by (say) "X". (PS: Read ':=' as "is of the form")
+//
+// If,
+//
+// X[i] := offset[i] (forall i : i <- [0, n))
+//
+// or,
+//
+// X[i] := reg[i] (forall i : i <- [0, n))
+// where:
+// reg[0] := base-register
+// | add base-register, offset[0]
+// reg[i] := add base-register, offset[i] (forall i : i <- (0, n))
+// where:
+// base-register := BaseRegDef <>, Register:(i32|i64) %<>
+//
+// where:
+// offset[] := Constant:i64<>
+// | Constant:i32<>
+// BaseRegDef := CopyFromReg | Load
+//
+// Define:
+//
+// Offset(X)[0] = Constant:(<i32>|<i64>)<0>
+// if X[0] := (reg[0] := base-register)
+// Offset(X)[i] = offset[i]
+// if X[i] := offset[i] (forall i : i <- [0, n))
+// Offset(X)[i] = offset[i]
+// if X[i] := add base-register, offset[i] (forall i : i <- [0, n))
+//
+// Now, (for: n > 1) if Offset(X) is an arithmetic progression, i.e:
+//
+// Offset(X)[i] := InitialValue(X) (if: i == 0)
+// | Offset(X)[i-1] + Delta(X) (forall i : i <- (0, n))
+// where:
+// InitialValue(X) = k (k is an arbitrary integer constant)
+// Delta(X) = (Offset(X)[1] - Offset(X)[0])
+//
+// Further define:
+// BaseReg(X) = Constant:(<i32>|<i64>)<0>
+// if X[0] := offset[0]
+// BaseReg(X) = base-register
+// if X[0] := base-register
+// | add base-register, offset[0]
+//
+// PS: In practice, we also accept the reverse-form:
+//
+// reg[n-1] := base-register
+// | add base-register, offset[n-1]
+// reg[i] := add base-register, offset[i] (forall i : i <- [0, n))
+// where:
+// base-register := BaseRegDef <>, Register:(i32<>|i64) %<>
+//
+// However, the essential idea remains as above.
+//
+// Proposition:
+// ------------
+//
+// Then, the sequence of selects "S", can be rewritten as sequence "S'", where
+// a choice is made between the two sequences via selects on the base-register,
+// the initial-value and the constant-difference (delta) between the successive
+// elements - i.e. the constructs that define such a sequence, and then these
+// are used to generate each access via a sequence of adds:
+//
+// cond = setcc ...
+// # Insert: 0
+// BaseReg(S) = select cond BaseReg(t) BaseReg(f)
+// # Insert: 1
+// InitialValue(S) = select cond InitialValue(t) InitialValue(f)
+// # Insert: 2
+// Delta(S) = select cond Delta(t) Delta(f)
+// ...
+// # Rewrite: 0
+// S'[0]: reg[0] = add BaseReg(S) InitialValue(S)
+// ...
+// # Rewrite: i
+// S'[i]: reg[i] = add reg[i-1] Delta(S)
+// ...
+// # Rewrite: n-1
+// S'[n-1]: reg[n-1] = add reg[n-2] Delta(S)
+// ...
+//
+// Conclusion:
+// -----------
+//
+// The above code transformation has two effects:
+//
+// a. Minimization of the setcc lifetime.
+//
+// The Rewrites: [0, n) do not have a dependency on the setcc. This is the
+// primary motivation for performing this transformation; see:
+//
+// [AArch64] Extremely slow code generation for series of function
+// calls/addition #50491
+// a.k.a: https://bugs.llvm.org/show_bug.cgi?id=51147
+//
+// As the length of "S" grows, the lifetime of the setcc grows, creating an
+// interference with all the other computations in the DAG, thus causing
+// long build times. Ths transformation helps keep the setcc lifetime minimal
+// in this case.
+//
+// b. Code size reduction.
+//
+// Also, while we have (upto) three new selects (Inserts [0-2]), we
+// eliminate one select and (upto) two adds with each rewrite [0, n) (which
+// brings in an add) therefore, reducing overall code size, and potentially
+// improving the runtime performance of the code.
+//
+// NewCodeSize = OldCodeSize - (3 * n) + (3 + n) (Best case).
+//
+// Notes:
+// ------
+//
+// As a future extension, extend the transformation to include sequences not
+// in arithmetic progression by creating two lookup tables for storing the
+// constant offsets for the [tf]-sequences, and selecting the appropriate
+// table once, based on the setcc value. Then, the offset to add to the base
+// register can be looked up in its entry in the appropriate table; the idea
+// being similar to the scheme above.
+//
+////////////////////////////////////////////////////////////////////////////////
+bool DAGCombiner::SelectSeqMinCCLifetime(void) {
+
+ LLVM_DEBUG(dbgs() << "SelectSeqMinCCLifetime:\n");
+
+ LLVM_DEBUG(dbgs() << "DAG PRE:\n");
+ for (SDNode &Node : DAG.allnodes())
+ LLVM_DEBUG(dbgs() << ""; Node.dump());
+
+ // Run through the DAG, looking for selects that use setcc,
+ // collect the setcc operand if we have not collected it already:
+ SmallSet<SDValue, 16> CandidateSetcc;
+ for (SDNode &Node : DAG.allnodes()) {
+ if (Node.getOpcode() != ISD::SELECT)
+ continue;
+ SDValue Op0 = Node.getOperand(0);
+ if (Op0.getOpcode() != ISD::SETCC)
+ continue;
+ if (!CandidateSetcc.contains(Op0))
+ CandidateSetcc.insert(Op0);
+ }
+
+ auto ProcessSetcc = [this](SDValue N) -> bool {
+ bool DAGModified = false;
+ assert(N.getOpcode() == ISD::SETCC);
+ LLVM_DEBUG(dbgs() << "Setcc: "; N.dump());
+ // SelectUser: All the select instructions that use this setcc.
+ // SelectUser lists the selects in the order they appear in the DAG i.e.
+ // going from top to down in the DAG, they appear in left to right order, 0
+ // onwards ... Ultimately, the accesses are generated using SelectUser.
+ SmallVector<SDNode *, 64> SelectUser;
+ SmallVector<SDValue, 64> TSeq; // Operand 1 of each SelectUser
+ SmallVector<SDValue, 64> FSeq; // Operand 2 of each SelectUser
+ SDNodeFlags AccessAddrFlags;
+ unsigned i = 0;
+ // Collect the 't' & 'f' operands:
+ for (SDNode *User : N->users()) {
+ // NOTE: The SDNode::use_iterator presents the use's in *reverse* order.
+ ++i;
+ if (User->getOpcode() != ISD::SELECT)
+ continue;
+ SDValue N1 = User->getOperand(1);
+ SDValue N2 = User->getOperand(2);
+ auto itU = SelectUser.begin();
+ SelectUser.insert(itU, User);
+ auto itO1 = TSeq.begin();
+ TSeq.insert(itO1, N1);
+ auto itO2 = FSeq.begin();
+ FSeq.insert(itO2, N2);
+ SDNodeFlags N1Flags = N1->getFlags();
+ SDNodeFlags N2Flags = N2->getFlags();
+ // Note the flags for later use ...
+ if (N1.getOpcode() == ISD::ADD)
+ AccessAddrFlags = N1Flags;
+ if (N2.getOpcode() == ISD::ADD)
+ AccessAddrFlags = N2Flags;
+ if (N1.getOpcode() != ISD::ADD || N2.getOpcode() != ISD::ADD)
+ continue;
+ // Both operands are ISD::ADD's Ascertain that either: Both operands have
+ // the same (NoSigned/NoUnsigned)Wrap flags or both operands have no
+ // (NoSigned/NoUnsigned)Wrap flags:
+ if (!((N1Flags.hasNoSignedWrap() && N2Flags.hasNoSignedWrap()) ||
+ (N1Flags.hasNoUnsignedWrap() && N2Flags.hasNoUnsignedWrap()) ||
+ ((!N1Flags.hasNoSignedWrap() && !N2Flags.hasNoSignedWrap()) &&
+ (!N1Flags.hasNoUnsignedWrap() && !N2Flags.hasNoUnsignedWrap())))) {
+ LLVM_DEBUG(
+ dbgs() << "Operand (Signed/Unsigned)flags mismatch; skip.\n");
+ return false;
+ }
+ }
+ LLVM_DEBUG(dbgs() << "User:\n");
+ for (unsigned i = 0; i < SelectUser.size(); ++i) {
+ LLVM_DEBUG(dbgs() << i << " "; SelectUser[i]->dump());
+ for (const SDValue &Op : SelectUser[i]->op_values())
+ LLVM_DEBUG(dbgs() << " "; Op.dump());
+ }
+ auto WellFormedAP = [this](const SmallVector<SDValue, 64> &OpSeq,
+ SDValue &BaseReg, int64_t &InitialVal,
+ int64_t &Delta, uint64_t &ADDCount) -> bool {
+ auto AscertainConstDiff = [](const SmallVector<int64_t, 64> &RawOffsets,
+ int64_t diff) -> bool {
+ bool WF = true;
+ for (unsigned i = 1; WF && (i < RawOffsets.size()); ++i)
+ WF &= (diff == (RawOffsets[i] - RawOffsets[i - 1]));
+ return WF;
+ };
+ if (OpSeq.size() < 2) {
+ LLVM_DEBUG(dbgs() << "Need at least two elements in an arithmetic "
+ "progression; skip.\n");
+ return false;
+ }
+ bool WF = true;
+ SmallVector<int64_t, 64> RawOffsets;
+ if (OpSeq[0].getOpcode() == ISD::Constant) {
+ for (auto V : OpSeq) {
+ WF &= ((V.getOpcode() == ISD::Constant));
+ if (WF) {
+ auto *C = dyn_cast<ConstantSDNode>(V);
+ if (C && C->getAPIntValue().getSignificantBits() <= 64)
+ RawOffsets.push_back(C->getSExtValue());
+ else {
+ LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+ return false;
+ }
+ } else {
+ LLVM_DEBUG(dbgs()
+ << "If the zeroeth element is a constant, so must be "
+ "each element in the sequence; skip.\n");
+ return false;
+ }
+ }
+ BaseReg = DAG.getConstant(0, SDLoc(OpSeq[0]), OpSeq[0].getValueType());
+ auto *C0 = dyn_cast<ConstantSDNode>(OpSeq[0]);
+ auto *C1 = dyn_cast<ConstantSDNode>(OpSeq[1]);
+ if ((C0 && C0->getAPIntValue().getSignificantBits() <= 64) &&
+ (C1 && C1->getAPIntValue().getSignificantBits() <= 64)) {
+ InitialVal = C0->getSExtValue();
+ Delta = (C1->getSExtValue()) - InitialVal;
+ WF &= AscertainConstDiff(RawOffsets, Delta);
+ return WF;
+ } else {
+ LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+ return false;
+ }
+ } else {
+ // Three cases arise:
+ // 0. All the elements are ISD::ADD's
+ // 1. The zeroeth element is a base register definition and the rest are
+ // ISD::ADD's
+ // 2. The OpSeq.size()-1'th element is a base register definition and
+ // the rest are ISD::ADD's
+ // l: inclusive lower bound
+ // u: non-inclusive upper bound
+ auto isBaseRegDef = [](const SDValue V) -> bool {
+ return (V.getOpcode() == ISD::CopyFromReg) ||
+ (V.getOpcode() == ISD::LOAD);
+ };
+ unsigned l, u;
+ if (OpSeq[0].getOpcode() == ISD::ADD &&
+ OpSeq[OpSeq.size() - 1].getOpcode() == ISD::ADD) {
+ l = 0;
+ u = OpSeq.size();
+ if (isBaseRegDef(OpSeq[0].getOperand(0))) {
+ BaseReg = OpSeq[0].getOperand(0);
+ } else {
+ LLVM_DEBUG(dbgs() << "Unable to get BaseReg; skip.\n");
+ return false;
+ }
+ } else if (isBaseRegDef(OpSeq[0])) {
+ l = 1;
+ u = OpSeq.size();
+ BaseReg = OpSeq[0];
+ } else if (isBaseRegDef(OpSeq[OpSeq.size() - 1])) {
+ l = 0;
+ u = OpSeq.size() - 1;
+ BaseReg = OpSeq[OpSeq.size() - 1];
+ } else {
+ LLVM_DEBUG(
+ dbgs()
+ << "Sequence not in (Add)+|(BaseRegDef)(Add)+|(Add)+(BaseRegDef) "
+ "form; skip.\n");
+ return false;
+ }
+ // Ascertain that the elements in OpSeq[l, u) are all ISD::ADD's in the
+ // form: BaseReg + (constant offset)
+ for (unsigned i = l; WF && (i < u); ++i) {
+ WF &= ((OpSeq[i].getOpcode() == ISD::ADD) &&
+ (OpSeq[i].getOperand(0) == BaseReg) &&
+ (OpSeq[i].getOperand(1).getOpcode() == ISD::Constant));
+ if (WF) {
+ ++ADDCount;
+ auto *C = dyn_cast<ConstantSDNode>(OpSeq[i].getOperand(1));
+ if (C && C->getAPIntValue().getSignificantBits() <= 64)
+ RawOffsets.push_back(C->getSExtValue());
+ else {
+ LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+ return false;
+ }
+ } else {
+ LLVM_DEBUG(dbgs() << "Sequence not in (Add = BaseReg + <constant "
+ "offset>)+ form; skip.\n");
+ return false;
+ }
+ }
+ if (!WF)
+ return WF;
+ if (isBaseRegDef(OpSeq[0])) {
+ InitialVal = 0;
+ // Element 1 is guaranteed to be ISD::ADD in the form:
+ // BaseReg + (constant offset)
+ auto *C = dyn_cast<ConstantSDNode>(OpSeq[1].getOperand(1));
+ if (C && C->getAPIntValue().getSignificantBits() <= 64)
+ Delta = (C->getSExtValue()) - InitialVal;
+ else {
+ LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+ return false;
+ }
+ } else { // Element 0 is an ISD::ADD
+ auto *C = dyn_cast<ConstantSDNode>(OpSeq[0].getOperand(1));
+ if (C && C->getAPIntValue().getSignificantBits() <= 64) {
+ InitialVal = C->getSExtValue();
+ if (OpSeq[1].getOpcode() == ISD::ADD) {
+ auto *C = dyn_cast<ConstantSDNode>(OpSeq[1].getOperand(1));
+ if (C && C->getAPIntValue().getSignificantBits() <= 64) {
+ Delta = (C->getSExtValue()) - InitialVal;
+ } else {
+ LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+ return false;
+ }
+ } else { // Element 1 is an BaseRegDef
+ Delta = 0 - InitialVal;
+ }
+ } else {
+ LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+ return false;
+ }
+ }
+ WF &= AscertainConstDiff(RawOffsets, Delta);
+ return WF;
+ }
+ };
+ uint64_t TSeqADDCount = 0;
+ SDValue TSeqBaseReg;
+ int64_t TSeqInitialVal = 0;
+ int64_t TSeqDelta = 0;
+ bool TSeqValid = WellFormedAP(TSeq, TSeqBaseReg, TSeqInitialVal, TSeqDelta,
+ TSeqADDCount);
+ uint64_t FSeqADDCount = 0;
+ SDValue FSeqBaseReg;
+ int64_t FSeqInitialVal = 0;
+ int64_t FSeqDelta = 0;
+ bool FSeqValid = WellFormedAP(FSeq, FSeqBaseReg, FSeqInitialVal, FSeqDelta,
+ FSeqADDCount);
+ if (!TSeqValid || !FSeqValid) {
+ LLVM_DEBUG(dbgs() << "Operands not well formed or not in aritmetic "
+ "progression; skip.\n");
+ return false;
+ }
+ EVT TSeqEltVT = TSeqBaseReg.getValueType();
+ EVT FSeqEltVT = FSeqBaseReg.getValueType();
+ EVT TSeqEltSTVT = TSeqEltVT.getScalarType();
+ EVT FSeqEltSTVT = FSeqEltVT.getScalarType();
+ SDValue TSeqInitialValReg = DAG.getConstant(
+ APInt(TSeqEltSTVT.getSizeInBits(), (uint64_t)TSeqInitialVal, true),
+ SDLoc(N), TSeqEltVT);
+ SDValue TSeqDeltaReg = DAG.getConstant(
+ APInt(TSeqEltSTVT.getSizeInBits(), (uint64_t)TSeqDelta, true), SDLoc(N),
+ TSeqEltVT);
+ SDValue FSeqInitialValReg = DAG.getConstant(
+ APInt(FSeqEltSTVT.getSizeInBits(), (uint64_t)FSeqInitialVal, true),
+ SDLoc(N), FSeqEltVT);
+ SDValue FSeqDeltaReg = DAG.getConstant(
+ APInt(FSeqEltSTVT.getSizeInBits(), (uint64_t)FSeqDelta, true), SDLoc(N),
+ FSeqEltVT);
+ // Sequence selector nodes:
+ // Construct a select for the base register:
+ SDValue BaseRegSelect = DAG.getSelect(
+ SDLoc(N),
+ TSeqBaseReg.getValueType(), // Note: We could use FSeqBaseReg as well.
+ N, TSeqBaseReg, FSeqBaseReg);
+ LLVM_DEBUG(dbgs() << "BaseReg: "; BaseRegSelect->dump());
+ // Construct a select for the initial value:
+ SDValue InitialValSelect = DAG.getSelect(
+ SDLoc(N),
+ TSeqInitialValReg
+ .getValueType(), // Note: We could use FSeqInitialValReg as well.
+ N, TSeqInitialValReg, FSeqInitialValReg);
+ LLVM_DEBUG(dbgs() << "InitialVal: "; InitialValSelect->dump());
+ // Construct a select for the delta value:
+ SDValue DeltaSelect =
+ DAG.getSelect(SDLoc(N), TSeqDeltaReg.getValueType(),
+ // Note: We could use FSeqDeltaReg as well.
+ N, TSeqDeltaReg, FSeqDeltaReg);
+ LLVM_DEBUG(dbgs() << "Delta: "; DeltaSelect->dump());
+ // Check if any of the sequence selector nodes correspond to the SelectUser
+ // nodes. If so, we need to bail out here itself as we will end-up creating
+ // a cycle in the generated access address nodes, after we substitute the
+ // the generated access addresss in the uses of the select nodes that we are
+ // eliminating. The exception to this is when the BaseReg is SelectUser[0]
+ // and the InitialValue is the constant 0; in this case, AccessAddr[0] will
+ // reduce to the BaseReg, so this is not an issue.
+ auto Overlap = [](const SmallVector<SDNode *, 64> NodeList,
+ const SDNode *Node) -> unsigned {
+ unsigned Index = 0;
+ for (auto N : NodeList)
+ if (N == Node)
+ return Index;
+ else
+ ++Index;
+ return Index;
+ };
+ auto isZero = [](SDValue V) -> bool {
+ if (V.getOpcode() == ISD::Constant) {
+ auto *C = dyn_cast<ConstantSDNode>(V);
+ if (C && C->getAPIntValue().getSignificantBits() <= 64)
+ return C->getSExtValue() == 0;
+ else {
+ LLVM_DEBUG(dbgs() << "Unable to obtain value; skip.\n");
+ return false;
+ }
+ }
+ return false;
+ };
+ unsigned SUSize = SelectUser.size();
+ unsigned BRIndex = Overlap(SelectUser, BaseRegSelect.getNode());
+ if ((((BRIndex != 0) || !isZero(InitialValSelect)) &&
+ (BRIndex != SUSize)) ||
+ (...
[truncated]
|
Adding the same reviewers as on the original Phabricator discussion, apologies if I included the wrong user: |
Friendly ping (again, apologies if I included a wrong user as a reviewer): |
I was going to say no need to @ given that anyone on review will see a normal comment too, but of course they aren't. You probably don't have the permissions to add reviewers yet, so I've done so for you. Please remove yourself from review if you're no longer interested. |
Thank you, David! |
@ppetrovic98 just getting in synch, please bear with me; let me do a test drive and write back. Thanks. |
@ppetrovic98 In combiner-select-seq.ll, the '-stop-after=amdgpu-isel' crashes llc: |
c4ca151
to
0fb3d51
Compare
@anmolparalkar-nxp I updated the test and rebased onto latest LLVM main. Thanks for the feedback! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall.
|
||
; PRE-LABEL: name: sum | ||
; PRE: %0:gpr64common = COPY $x0 | ||
; PRE: %1:gpr64common = nuw ADDXri %0, 56, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and following, change to: PRE-NEXT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, do you mean to change all the PRE/PST directives starting from this line or just change these two lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From line 134 up to and including line 216, the PRE's should become PRE-NEXT's.
|
||
; PST-LABEL: name: sum | ||
; PST: %0:gpr64common = COPY $x0 | ||
; PST: %1:gpr64 = SUBSXri %0, 0, 0, implicit-def $nzcv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and following, change to: PST-NEXT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From line 220 up to and including line 289, the PST's should become PST-NEXT's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the testcase, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, I did not realize, I had missed the {PRE, PST}-NEXT's in the test; kindly add. LGTM, otherwise. Thanks.
|
||
; PST-LABEL: name: sum | ||
; PST: %0:gpr64common = COPY $x0 | ||
; PST: %1:gpr64 = SUBSXri %0, 0, 0, implicit-def $nzcv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From line 220 up to and including line 289, the PST's should become PST-NEXT's
Minimize condition-code lifetime for select sequences over constants in arithmetic progressions. It reduces the number of comparisons and shortens the condition code lifetime, improving performance. Original patch by Anmol Paralkar (@anmolparalkar-nxp), initially proposed on Phabricator as D136047. Modified and updated to work with the latest LLVM version. The test case has been updated to match the current output format.
0fb3d51
to
8420bb7
Compare
static cl::opt<bool> SelectSeq( | ||
"combiner-select-seq", cl::Hidden, cl::init(false), | ||
cl::desc("DAG combiner minimize condition-code lifetime for select " | ||
"sequences over constants in arithmetic progressions"), | ||
cl::ZeroOrMore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the kind of flags that will not be used, can you just do this unconditionally or based on some existing set of target preference hooks? e.g. hasMultipleConditionRegisters, shouldNormalizeToSelectSequence
CandidateSetcc.insert(Op0); | ||
} | ||
|
||
auto ProcessSetcc = [this](SDValue N) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested lambdas, move to proper utility functions?
auto ProcessSetcc = [this](SDValue N) -> bool { | ||
bool DAGModified = false; | ||
assert(N.getOpcode() == ISD::SETCC); | ||
LLVM_DEBUG(dbgs() << "Setcc: "; N.dump()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operator << works on SDValues, don't need separate dump statement
if (!CandidateSetcc.contains(Op0)) | ||
CandidateSetcc.insert(Op0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double map lookup
// Run through the DAG, looking for selects that use setcc, | ||
// collect the setcc operand if we have not collected it already: | ||
SmallSet<SDValue, 16> CandidateSetcc; | ||
for (SDNode &Node : DAG.allnodes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should not be scanning through the entire DAG, ever really. If you need longer context maybe move to codegenprepare?
(V.getOpcode() == ISD::LOAD); | ||
}; | ||
unsigned l, u; | ||
if (OpSeq[0].getOpcode() == ISD::ADD && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isAddLike? Should catch the disjoint or case
This PR reintroduces the patch initially proposed by Anmol Paralkar (@anmolparalkar-nxp ) on Phabricator as D136047. The original patch aimed to minimize the condition-code lifetime for select sequences where constants form an arithmetic progression, improving performance by reducing the number of comparisons.
Modifications:
Updated the patch to work with the latest LLVM version.
Adapted the test case (combiner-select-seq.ll) to match the current output format from the LLVM main branch.
Tested the changes.
Reopening the discussion:
The original patch received a comment from @RKSimon suggesting that the transformation might be more suitable in CodeGenPrepare rather than DAGCombiner. Since the discussion was not concluded, I would like to reopen the topic for community input.
Credits:
Original author: Anmol P. Paralkar (@anmolparalkar-nxp )
Link to original Phabricator discussion: https://reviews.llvm.org/D136047
Link to original problem (bug): https://bugs.llvm.org/show_bug.cgi?id=51147