Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPlan] Use ResumePhi to create reduction resume phis. #110004

Merged
merged 12 commits into from
Oct 28, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Sep 25, 2024

Use VPInstruction::ResumePhi to create phi nodes for reduction resume values in the scalar preheader, similar to how ResumePhis are used for first-order recurrence resume values after 9a5a873.

This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop. Updating phis for the epilogue vector loops requires special attention, because additional incoming values from the bypass blocks need to be added.

@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Use VPInstruction::ResumePhi to create phi nodes for reduction resume values.

This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop.


Full diff: https://github.com/llvm/llvm-project/pull/110004.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+41-41)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-reduction.ll (+9)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll (+2)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing.ll (+9)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index cac0b57fc69649..4f9e990e5483da 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7422,8 +7422,9 @@ static void addRuntimeUnrollDisableMetaData(Loop *L) {
 }
 
 // Check if \p RedResult is a ComputeReductionResult instruction, and if it is
-// create a merge phi node for it and add it to \p ReductionResumeValues.
-static void createAndCollectMergePhiForReduction(
+// add it to \p ReductionResumeValues and add incoming values from the main
+// vector loop.
+static void updateAndCollectMergePhiForReductionForEpilogueVectorization(
     VPInstruction *RedResult,
     DenseMap<const RecurrenceDescriptor *, Value *> &ReductionResumeValues,
     VPTransformState &State, Loop *OrigLoop, BasicBlock *LoopMiddleBlock,
@@ -7432,15 +7433,24 @@ static void createAndCollectMergePhiForReduction(
       RedResult->getOpcode() != VPInstruction::ComputeReductionResult)
     return;
 
+  using namespace VPlanPatternMatch;
+  VPValue *ResumePhiVPV =
+      cast<VPInstruction>(*find_if(RedResult->users(), [](VPUser *U) {
+        return match(U, m_VPInstruction<VPInstruction::ResumePhi>(m_VPValue(),
+                                                                  m_VPValue()));
+      }));
+  auto *BCBlockPhi = cast<PHINode>(State.get(ResumePhiVPV, true));
   auto *PhiR = cast<VPReductionPHIRecipe>(RedResult->getOperand(0));
   const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
+  if (!VectorizingEpilogue) {
+    ReductionResumeValues[&RdxDesc] = BCBlockPhi;
+    return;
+  }
 
-  Value *FinalValue =
-      State.get(RedResult, VPIteration(0, VPLane::getFirstLane()));
   auto *ResumePhi =
       dyn_cast<PHINode>(PhiR->getStartValue()->getUnderlyingValue());
-  if (VectorizingEpilogue && RecurrenceDescriptor::isAnyOfRecurrenceKind(
-                                 RdxDesc.getRecurrenceKind())) {
+  if (RecurrenceDescriptor::isAnyOfRecurrenceKind(
+          RdxDesc.getRecurrenceKind())) {
     auto *Cmp = cast<ICmpInst>(PhiR->getStartValue()->getUnderlyingValue());
     assert(Cmp->getPredicate() == CmpInst::ICMP_NE);
     assert(Cmp->getOperand(1) == RdxDesc.getRecurrenceStartValue());
@@ -7450,42 +7460,15 @@ static void createAndCollectMergePhiForReduction(
          "when vectorizing the epilogue loop, we need a resume phi from main "
          "vector loop");
 
-  // TODO: bc.merge.rdx should not be created here, instead it should be
-  // modeled in VPlan.
   BasicBlock *LoopScalarPreHeader = OrigLoop->getLoopPreheader();
-  // Create a phi node that merges control-flow from the backedge-taken check
-  // block and the middle block.
-  auto *BCBlockPhi =
-      PHINode::Create(FinalValue->getType(), 2, "bc.merge.rdx",
-                      LoopScalarPreHeader->getTerminator()->getIterator());
-
   // If we are fixing reductions in the epilogue loop then we should already
   // have created a bc.merge.rdx Phi after the main vector body. Ensure that
   // we carry over the incoming values correctly.
   for (auto *Incoming : predecessors(LoopScalarPreHeader)) {
-    if (Incoming == LoopMiddleBlock)
-      BCBlockPhi->addIncoming(FinalValue, Incoming);
-    else if (ResumePhi && is_contained(ResumePhi->blocks(), Incoming))
-      BCBlockPhi->addIncoming(ResumePhi->getIncomingValueForBlock(Incoming),
-                              Incoming);
-    else
-      BCBlockPhi->addIncoming(RdxDesc.getRecurrenceStartValue(), Incoming);
+    if (ResumePhi && is_contained(ResumePhi->blocks(), Incoming))
+      BCBlockPhi->setIncomingValueForBlock(
+          Incoming, ResumePhi->getIncomingValueForBlock(Incoming));
   }
-
-  auto *OrigPhi = cast<PHINode>(PhiR->getUnderlyingValue());
-  // TODO: This fixup should instead be modeled in VPlan.
-  // Fix the scalar loop reduction variable with the incoming reduction sum
-  // from the vector body and from the backedge value.
-  int IncomingEdgeBlockIdx =
-      OrigPhi->getBasicBlockIndex(OrigLoop->getLoopLatch());
-  assert(IncomingEdgeBlockIdx >= 0 && "Invalid block index");
-  // Pick the other block.
-  int SelfEdgeBlockIdx = (IncomingEdgeBlockIdx ? 0 : 1);
-  OrigPhi->setIncomingValue(SelfEdgeBlockIdx, BCBlockPhi);
-  Instruction *LoopExitInst = RdxDesc.getLoopExitInstr();
-  OrigPhi->setIncomingValue(IncomingEdgeBlockIdx, LoopExitInst);
-
-  ReductionResumeValues[&RdxDesc] = BCBlockPhi;
 }
 
 std::pair<DenseMap<const SCEV *, Value *>,
@@ -7579,11 +7562,12 @@ LoopVectorizationPlanner::executePlan(
   DenseMap<const RecurrenceDescriptor *, Value *> ReductionResumeValues;
   auto *ExitVPBB =
       cast<VPBasicBlock>(BestVPlan.getVectorLoopRegion()->getSingleSuccessor());
-  for (VPRecipeBase &R : *ExitVPBB) {
-    createAndCollectMergePhiForReduction(
-        dyn_cast<VPInstruction>(&R), ReductionResumeValues, State, OrigLoop,
-        State.CFG.VPBB2IRBB[ExitVPBB], ExpandedSCEVs);
-  }
+  if (IsEpilogueVectorization)
+    for (VPRecipeBase &R : *ExitVPBB) {
+      updateAndCollectMergePhiForReductionForEpilogueVectorization(
+          dyn_cast<VPInstruction>(&R), ReductionResumeValues, State, OrigLoop,
+          State.CFG.VPBB2IRBB[ExitVPBB], ExpandedSCEVs);
+    }
 
   // 2.6. Maintain Loop Hints
   // Keep all loop hints from the original loop on the vector loop (we'll
@@ -9369,6 +9353,22 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
                                                                   m_VPValue()));
     });
 
+    VPBasicBlock *ScalarPHVPBB = nullptr;
+    if (MiddleVPBB->getNumSuccessors() == 2) {
+      // Order is strict: first is the exit block, second is the scalar
+      // preheader.
+      ScalarPHVPBB = cast<VPBasicBlock>(MiddleVPBB->getSuccessors()[1]);
+    } else {
+      ScalarPHVPBB = cast<VPBasicBlock>(MiddleVPBB->getSingleSuccessor());
+    }
+
+    VPBuilder ScalarPHBuilder(ScalarPHVPBB);
+    auto *ResumePhiRecipe = ScalarPHBuilder.createNaryOp(
+        VPInstruction::ResumePhi, {FinalReductionResult, PhiR->getStartValue()},
+        {}, "bc.merge.rdx");
+    auto *RedPhi = cast<PHINode>(PhiR->getUnderlyingInstr());
+    Plan->addLiveOut(RedPhi, ResumePhiRecipe);
+
     // Adjust AnyOf reductions; replace the reduction phi for the selected value
     // with a boolean reduction phi node to check if the condition is true in
     // any iteration. The final value is selected by the final
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-reduction.ll
index 90c209cf3f5186..6a435709aeb2b2 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-reduction.ll
@@ -65,7 +65,10 @@ define i32 @reduction(ptr %a, i64 %n, i32 %start) {
 ; IF-EVL-INLOOP-NEXT: No successors
 ; IF-EVL-INLOOP-EMPTY:
 ; IF-EVL-INLOOP-NEXT: scalar.ph:
+; IF-EVL-INLOOP-NEXT:   EMIT vp<[[RED_RESUME:%.+]]> = resume-phi vp<[[RDX]]>, ir<%start>
 ; IF-EVL-INLOOP-NEXT: No successors
+; IF-EVL-INLOOP-EMPTY:
+; IF-EVL-INLOOP-NEXT: Live-out i32 %rdx = vp<[[RED_RESUME]]>
 ; IF-EVL-INLOOP-NEXT: }
 ;
 
@@ -104,7 +107,10 @@ define i32 @reduction(ptr %a, i64 %n, i32 %start) {
 ; NO-VP-OUTLOOP-NEXT: No successors
 ; NO-VP-OUTLOOP-EMPTY:
 ; NO-VP-OUTLOOP-NEXT: scalar.ph:
+; NO-VP-OUTLOOP-NEXT:   EMIT vp<[[RED_RESUME:%.+]]> = resume-phi vp<[[RDX]]>, ir<%start>
 ; NO-VP-OUTLOOP-NEXT: No successors
+; NO-VP-OUTLOOP-EMPTY:
+; NO-VP-OUTLOOP-NEXT: Live-out i32 %rdx = vp<[[RED_RESUME]]>
 ; NO-VP-OUTLOOP-NEXT: }
 ;
 
@@ -143,7 +149,10 @@ define i32 @reduction(ptr %a, i64 %n, i32 %start) {
 ; NO-VP-INLOOP-NEXT: No successors
 ; NO-VP-INLOOP-EMPTY:
 ; NO-VP-INLOOP-NEXT: scalar.ph:
+; NO-VP-INLOOP-NEXT:   EMIT vp<[[RED_RESUME:%.+]]> = resume-phi vp<[[RDX]]>, ir<%start>
 ; NO-VP-INLOOP-NEXT: No successors
+; NO-VP-INLOOP-EMPTY:
+; NO-VP-INLOOP-NEXT: Live-out i32 %rdx = vp<[[RED_RESUME]]>
 ; NO-VP-INLOOP-NEXT: }
 ;
 entry:
diff --git a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
index 8e56614a2e3d5c..b05980bef1b38f 100644
--- a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
+++ b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
@@ -232,9 +232,11 @@ define i32 @sink_replicate_region_3_reduction(i32 %x, i8 %y, ptr %ptr) optsize {
 ; CHECK-EMPTY:
 ; CHECK-NEXT: scalar.ph
 ; CHECK-NEXT:   EMIT vp<[[RESUME_1_P:%.*]]> = resume-phi vp<[[RESUME_1]]>, ir<0>
+; CHECK-NEXT:   EMIT vp<[[RESUME_RED:%.+]]> = resume-phi vp<[[RED_RES]]>, ir<1234>
 ; CHECK-NEXT: No successors
 ; CHECK-EMPTY:
 ; CHECK-NEXT: Live-out i32 %recur = vp<[[RESUME_1_P]]>
+; CHECK-NEXT: Live-out i32 %and.red = vp<[[RESUME_RED]]>
 ; CHECK-NEXT: }
 ;
 entry:
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing.ll
index 26974c2307065a..8b2597f7b1cbab 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-printing.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-printing.ll
@@ -165,7 +165,10 @@ define float @print_reduction(i64 %n, ptr noalias %y) {
 ; CHECK-NEXT: No successors
 ; CHECK-EMPTY:
 ; CHECK-NEXT: scalar.ph
+; CHECK-NEXT:   EMIT vp<[[RED_RESUME:%.+]]> = resume-phi vp<[[RED_RES]]>, ir<0.000000e+00>
 ; CHECK-NEXT: No successors
+; CHECK-EMPTY:
+; CHECK-NEXT: Live-out float %red = vp<[[RED_RESUME]]>
 ; CHECK-NEXT: }
 ;
 entry:
@@ -220,7 +223,10 @@ define void @print_reduction_with_invariant_store(i64 %n, ptr noalias %y, ptr no
 ; CHECK-NEXT: No successors
 ; CHECK-EMPTY:
 ; CHECK-NEXT: scalar.ph
+; CHECK-NEXT:   EMIT vp<[[RED_RESUME:%.+]]> = resume-phi vp<[[RED_RES]]>, ir<0.000000e+00>
 ; CHECK-NEXT: No successors
+; CHECK-EMPTY:
+; CHECK-NEXT: Live-out float %red = vp<[[RED_RESUME]]>
 ; CHECK-NEXT: }
 ;
 entry:
@@ -447,7 +453,10 @@ define float @print_fmuladd_strict(ptr %a, ptr %b, i64 %n) {
 ; CHECK-NEXT: No successors
 ; CHECK-EMPTY:
 ; CHECK-NEXT: scalar.ph
+; CHECK-NEXT:   EMIT vp<[[RED_RESUME:%.+]]> = resume-phi vp<[[RED_RES]]>, ir<0.000000e+00>
 ; CHECK-NEXT: No successors
+; CHECK-EMPTY:
+; CHECK-NEXT: Live-out float %sum.07 = vp<[[RED_RESUME]]>
 ; CHECK-NEXT:}
 
 entry:

@fhahn fhahn force-pushed the vplan-resumephi-vpinst-for-reductions branch from 8f40a32 to 10cc688 Compare October 7, 2024 14:02
@fhahn
Copy link
Contributor Author

fhahn commented Oct 7, 2024

ping :)

Use VPInstruction::ResumePhi to create phi nodes for reduction resume
values.

This allows simplifying createAndCollectMergePhiForReduction to only
collect reduction resume phis when vectorizing epilogue loops and adding
extra incoming edges from the main vector loop.
@fhahn fhahn force-pushed the vplan-resumephi-vpinst-for-reductions branch from 10cc688 to d3614bc Compare October 10, 2024 16:10
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 13, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 13, 2024
Model updating IV users directly in VPlan, replace fixupIVUsers.

Depends on llvm#110004,
llvm#109975 and
llvm#112145.
Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to clarify various related aspects, such as where ResumePhi's already appear, if this patch is the first to populate the scalar pre-header with recipes, why handling the reduction of epilog loop requires special attention.

@@ -9411,6 +9395,22 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
});
FinalReductionResult->insertBefore(*MiddleVPBB, IP);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the above TODO also be taken care of - refrain from creating ComputeReductionResult for in-loop reductions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but may be better as follow-up?

// loop.
static void updateAndCollectMergePhiForReductionForEpilogueVectorization(
VPInstruction *RedResult, VPTransformState &State, Loop *OrigLoop,
BasicBlock *LoopMiddleBlock, bool VectorizingEpilogue) {
if (!RedResult ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!RedResult ||
if (!VectorizingEpilogue || !RedResult ||

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

if (!RedResult ||
RedResult->getOpcode() != VPInstruction::ComputeReductionResult)
return;

using namespace VPlanPatternMatch;
VPValue *ResumePhiVPV =
cast<VPInstruction>(*find_if(RedResult->users(), [](VPUser *U) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cast to VPInstruction but store as VPValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to convert from VPUser at the moment

if (!RedResult ||
RedResult->getOpcode() != VPInstruction::ComputeReductionResult)
return;

using namespace VPlanPatternMatch;
VPValue *ResumePhiVPV =
cast<VPInstruction>(*find_if(RedResult->users(), [](VPUser *U) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert there's one ResumePhi user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

auto *PhiR = cast<VPReductionPHIRecipe>(RedResult->getOperand(0));
const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
if (!VectorizingEpilogue)
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assert((!VectorizingEpilogue || ResumePhi) below can assert ResumePhi alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Incoming);
else
BCBlockPhi->addIncoming(RdxDesc.getRecurrenceStartValue(), Incoming);
if (ResumePhi && is_contained(ResumePhi->blocks(), Incoming))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (ResumePhi && is_contained(ResumePhi->blocks(), Incoming))
if (is_contained(ResumePhi->blocks(), Incoming))

ResumePhi is asserted above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped thanks!

auto *BCBlockPhi =
PHINode::Create(FinalValue->getType(), 2, "bc.merge.rdx",
LoopScalarPreHeader->getTerminator()->getIterator());

// If we are fixing reductions in the epilogue loop then we should already
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is no longer a question, we are fixing reductions in the epilogue loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

bool VectorizingEpilogue) {
// create a merge phi node for it and add incoming values from the main vector
// loop.
static void updateAndCollectMergePhiForReductionForEpilogueVectorization(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"AndCollect" - this becomes obsolete w/o ReductionResumeValues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks and clarified comment (merge phi isn't created any longer)

@@ -7467,23 +7467,31 @@ static void addRuntimeUnrollDisableMetaData(Loop *L) {
}

// Check if \p RedResult is a ComputeReductionResult instruction, and if it is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check if \p RedResult is a ComputeReductionResult instruction, and if it is
// If \p RedResult is a ComputeReductionResult when vectorizing the epilog loop,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

BasicBlock *LoopScalarPreHeader = OrigLoop->getLoopPreheader();
// Create a phi node that merges control-flow from the backedge-taken check
// block and the middle block.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what is "backedge-taken check block", abbreviated in "BCBlock"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure, but I think this meant the iterations checks.

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to clarify various related aspects, such as where ResumePhi's already appear, if this patch is the first to populate the scalar pre-header with recipes, why handling the reduction of epilog loop requires special attention.

Thanks, updated the description. This follows 9a5a873 which added ResumePHIs for FOR resume values to the scalar preheader. Epilogue needs special attention, as it needs extra incoming values from the bypass blocks.

@@ -7467,23 +7467,31 @@ static void addRuntimeUnrollDisableMetaData(Loop *L) {
}

// Check if \p RedResult is a ComputeReductionResult instruction, and if it is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

bool VectorizingEpilogue) {
// create a merge phi node for it and add incoming values from the main vector
// loop.
static void updateAndCollectMergePhiForReductionForEpilogueVectorization(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks and clarified comment (merge phi isn't created any longer)

// loop.
static void updateAndCollectMergePhiForReductionForEpilogueVectorization(
VPInstruction *RedResult, VPTransformState &State, Loop *OrigLoop,
BasicBlock *LoopMiddleBlock, bool VectorizingEpilogue) {
if (!RedResult ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

if (!RedResult ||
RedResult->getOpcode() != VPInstruction::ComputeReductionResult)
return;

using namespace VPlanPatternMatch;
VPValue *ResumePhiVPV =
cast<VPInstruction>(*find_if(RedResult->users(), [](VPUser *U) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to convert from VPUser at the moment

if (!RedResult ||
RedResult->getOpcode() != VPInstruction::ComputeReductionResult)
return;

using namespace VPlanPatternMatch;
VPValue *ResumePhiVPV =
cast<VPInstruction>(*find_if(RedResult->users(), [](VPUser *U) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

BasicBlock *LoopScalarPreHeader = OrigLoop->getLoopPreheader();
// Create a phi node that merges control-flow from the backedge-taken check
// block and the middle block.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure, but I think this meant the iterations checks.

auto *BCBlockPhi =
PHINode::Create(FinalValue->getType(), 2, "bc.merge.rdx",
LoopScalarPreHeader->getTerminator()->getIterator());

// If we are fixing reductions in the epilogue loop then we should already
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Incoming);
else
BCBlockPhi->addIncoming(RdxDesc.getRecurrenceStartValue(), Incoming);
if (ResumePhi && is_contained(ResumePhi->blocks(), Incoming))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped thanks!

for (VPRecipeBase &R : *ExitVPBB) {
updateAndCollectMergePhiForReductionForEpilogueVectorization(
dyn_cast<VPInstruction>(&R), State, OrigLoop,
State.CFG.VPBB2IRBB[ExitVPBB], ExpandedSCEVs);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExpandedSCEVS indicates that we are vectorizing the epilogue vector loop and IsEpilogueVectorization is true if either main or epilogue loop is vectorized during epilogue vectorization . Added to check above, removed from function

@@ -9411,6 +9395,22 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
});
FinalReductionResult->insertBefore(*MiddleVPBB, IP);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but may be better as follow-up?

fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 21, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
@fhahn
Copy link
Contributor Author

fhahn commented Oct 24, 2024

ping :)

fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 24, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
@ayalz
Copy link
Collaborator

ayalz commented Oct 25, 2024

Would be good to clarify various related aspects, such as where ResumePhi's already appear, if this patch is the first to populate the scalar pre-header with recipes, why handling the reduction of epilog loop requires special attention.

Thanks, updated the description. This follows 9a5a873 which added ResumePHIs for FOR resume values to the scalar preheader. Epilogue needs special attention, as it needs extra incoming values from the bypass blocks.

Excellent.
Trying to further clarify: when main loop was vectorized, it connects to scalar preheader phi incoming from bypasses, and from its middle block (so no further processing is needed when epilog isn't vectorized). When epilog loop is vectorized, it should add to the scalar preheader phi an incoming from its middle block?

Note that similar processing of epilog loop should apply to FORs as well (which were first to use ResumePhi recipes), if/when loops with FORs become subject to epilog vectorization - currently disabled by isCandidateForEpilogueVectorization().

Comment on lines 7513 to 7514
assert(ResumePhiVPV->getNumUsers() == 1 &&
"ResumePhi must have a single user");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to assert that RedResult has exactly one user that is a ResumePhi (say using count_if), as we only deal with the first one. Overall one ResumePhi user is expected in scalar preheader and another VPIRInstruction user in exit block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use count_if, thanks!

Overall one ResumePhi user is expected in scalar preheader and another VPIRInstruction user in exit block?

Yep

if (IsEpilogueVectorization && ExpandedSCEVs)
for (VPRecipeBase &R : *ExitVPBB) {
updateMergePhiForReductionForEpilogueVectorization(
dyn_cast<VPInstruction>(&R), State, OrigLoop,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, simpler to pass R and have the caller match it, as we're unsure it's the relevant VPInstruction here anyway:

Suggested change
dyn_cast<VPInstruction>(&R), State, OrigLoop,
R, State, OrigLoop,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

dyn_cast<VPInstruction>(&R), State, OrigLoop,
State.CFG.VPBB2IRBB[ExitVPBB], ExpandedSCEVs);
}
if (IsEpilogueVectorization && ExpandedSCEVs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a better way to detect if the epilog loop is now the one being vectorized, than checking if ExpandedSCEVs isn't null.

Suggested change
if (IsEpilogueVectorization && ExpandedSCEVs)
if (ExpandedSCEVs) {
assert(IsEpilogueVectorization && "SCEVs expanded expected only when vectorizing the epilog loop");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted in 9648271 to only set the flag if the epilogue is vectorized.

}));
assert(ResumePhiVPV->getNumUsers() == 1 &&
"ResumePhi must have a single user");
auto *BCBlockPhi = cast<PHINode>(State.get(ResumePhiVPV, true));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better pick a name we're sure of, rather than "BCBlockPhi"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to EpiResumePhi (and ResumePhi -> MainResumePhi), and also moved the code closer to the uses

// If we are fixing reductions in the epilogue loop then we should already
// have created a bc.merge.rdx Phi after the main vector body. Ensure that
// we carry over the incoming values correctly.
// When fixing reductions in the epilogue loop then we should already have
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// When fixing reductions in the epilogue loop then we should already have
// When fixing reductions in the epilogue loop we should already have

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

auto *Cmp = cast<ICmpInst>(PhiR->getStartValue()->getUnderlyingValue());
assert(Cmp->getPredicate() == CmpInst::ICMP_NE);
assert(Cmp->getOperand(1) == RdxDesc.getRecurrenceStartValue());
ResumePhi = cast<PHINode>(Cmp->getOperand(0));
}
assert((!VectorizingEpilogue || ResumePhi) &&
assert(ResumePhi &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps clearer to set
PHINode *ResumePhi to cast<PHINode>(PhiR->getStartValue()->getUnderlyingValue()) for reductions other than AnyOf, rather than speculatively dyn_cast it and then assert it's not null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

if (!RedResult ||
RedResult->getOpcode() != VPInstruction::ComputeReductionResult)
return;

using namespace VPlanPatternMatch;
VPValue *ResumePhiVPV =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may as well define ResumePhiVPV to be a VPInstruction*?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

auto *PhiR = cast<VPReductionPHIRecipe>(RedResult->getOperand(0));
const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();

Value *FinalValue = State.get(RedResult, VPLane(VPLane::getFirstLane()));
auto *ResumePhi =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somewhat confusing to first locate a ResumePhi recipe in scalar preheader, and now to define a ResumePhi PHINode as the starting value of the reduction header phi (merging bypasses with main loop result), possibly going through a compare for AnyOf reductions (boolean reduction's start value?)... some explanation and/or more distinct names may be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to MainResumePhi here.

else
BCBlockPhi->addIncoming(RdxDesc.getRecurrenceStartValue(), Incoming);
if (is_contained(ResumePhi->blocks(), Incoming))
BCBlockPhi->setIncomingValueForBlock(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth asserting what the incoming values are before (re)setting them?

// If \p R is a ComputeReductionResult when vectorizing the epilog loop,
// update the reduction's scalar PHI node by adding the incoming value from the
// main vector loop.
static void updateMergePhiForReductionForEpilogueVectorization(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static void updateMergePhiForReductionForEpilogueVectorization(
static void fixReductionScalarResumeWhenVectorizingEpilog(

sounds better?
This case cannot be handled by recipes alone as it involves passing a value between recipes of distinct VPlans, hence requires "fixing".
There are several "Merge phis" involved, here we're dealing with the one in the scalar preheader feeding the value for the scalar loop to resume from.
When executing the VPlan of the epilog loop, this phi considered 2 values (ignorant of the main loop) - the one coming from the epilog loop via its middle block, and the original initial value for bypassing from entry to scalar loop. But there is a 3rd value to consider - computed by the vectorized main loop and bypassing the epilog loop. This value was already fed into the "scalar preheader" (actually vector-epilog-tripcount-check block) when executing the VPlan of the main loop (which needs no fixing). We now look for this value and feed it into the resumePhi of the scalar preheader, corresponding to the edge from this block to the scalar preheader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Comment on lines 7596 to 7599
auto *EpiResumePhiVPI =
cast<VPInstruction>(*find_if(RedResult->users(), IsResumePhi));
assert(count_if(RedResult->users(), IsResumePhi) == 1 &&
"ResumePhi must have a single user");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *EpiResumePhiVPI =
cast<VPInstruction>(*find_if(RedResult->users(), IsResumePhi));
assert(count_if(RedResult->users(), IsResumePhi) == 1 &&
"ResumePhi must have a single user");
assert(count_if(RedResult->users(), IsResumePhi) == 1 &&
"RedResult must have a single ResumePhi user");
auto *EpiResumePhiVPI =
cast<VPInstruction>(*find_if(RedResult->users(), IsResumePhi));

(better assert earlier)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!


VPBuilder ScalarPHBuilder(ScalarPHVPBB);
auto *ResumePhiRecipe = ScalarPHBuilder.createNaryOp(
VPInstruction::ResumePhi, {FinalReductionResult, PhiR->getStartValue()},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ResumePhi placed in the scalar preheader prepares for 2 values. When vectorizing an epilog loop, it should ideally prepare for 3: the ResumePhi after the main loop should feed the ResumePhi after the epilog loop (unless we're sure according to the trip count that if main and epilog are run they will take care of all iterations, w/o leaving any remainder for the scalar loop). But neither recipes nor blocks of the two VPlans can currently be connected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately there is no way to determine if the plan will be used for epilogue vectorization at this point I think. Once we model it explicitly in VPlan, the update should be connected here

Comment on lines 9500 to 9507
VPBasicBlock *ScalarPHVPBB = nullptr;
if (MiddleVPBB->getNumSuccessors() == 2) {
// Order is strict: first is the exit block, second is the scalar
// preheader.
ScalarPHVPBB = cast<VPBasicBlock>(MiddleVPBB->getSuccessors()[1]);
} else {
ScalarPHVPBB = cast<VPBasicBlock>(MiddleVPBB->getSingleSuccessor());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When vectorizing the main loop, to be followed by vectorizing the epilog loop, this successor of middle block is ... the vector-epilog-tripcount-check block, rather than the ScalarPH.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once both plans (or at least the skeletons) are included in the VPlan

static void updateMergePhiForReductionForEpilogueVectorization(
VPRecipeBase *R, VPTransformState &State, Loop *OrigLoop,
BasicBlock *LoopMiddleBlock) {
auto *RedResult = dyn_cast<VPInstruction>(R);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *RedResult = dyn_cast<VPInstruction>(R);
auto *EpiRedResult = dyn_cast<VPInstruction>(R);

would be good to keep Epi/Main names consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Comment on lines 7577 to 7582
PHINode *MainResumePhi;
if (RecurrenceDescriptor::isAnyOfRecurrenceKind(
RdxDesc.getRecurrenceKind())) {
auto *Cmp = cast<ICmpInst>(PhiR->getStartValue()->getUnderlyingValue());
assert(Cmp->getPredicate() == CmpInst::ICMP_NE);
assert(Cmp->getOperand(1) == RdxDesc.getRecurrenceStartValue());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PHINode *MainResumePhi;
if (RecurrenceDescriptor::isAnyOfRecurrenceKind(
RdxDesc.getRecurrenceKind())) {
auto *Cmp = cast<ICmpInst>(PhiR->getStartValue()->getUnderlyingValue());
assert(Cmp->getPredicate() == CmpInst::ICMP_NE);
assert(Cmp->getOperand(1) == RdxDesc.getRecurrenceStartValue());
Value *MainResumeValue = EpiRedHeaderPhiR->getStartValue()->getUnderlyingValue();
if (RecurrenceDescriptor::isAnyOfRecurrenceKind(
RdxDesc.getRecurrenceKind())) {
auto *Cmp = cast<ICmpInst>(MainResumeValue);
assert(Cmp->getPredicate() == CmpInst::ICMP_NE && "AnyOf expected to start with ICMP_NE");
assert(Cmp->getOperand(1) == RdxDesc.getRecurrenceStartValue() && "AnyOf expected to start by comparing main resume value to original start value");
MainResumeValue = Cmp->getOperand(0);
}
auto *MainResumePhi = cast<PHINode>(MainResumeValue);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@@ -9511,6 +9497,22 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
});
FinalReductionResult->insertBefore(*MiddleVPBB, IP);

VPBasicBlock *ScalarPHVPBB = nullptr;
if (MiddleVPBB->getNumSuccessors() == 2) {
// Order is strict: first is the exit block, second is the scalar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simplify by retrieving the last successor, MiddleVPBB->getSuccessors()[MiddleVPBB->getNumSuccessors()-1].

In general, only Exit should be a VPIRBB, and/or should be recorded in VPlan, so can identify the ScalarPHVPBB as the first successor of middle-block that is not Exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Comment on lines 7591 to 7597
using namespace VPlanPatternMatch;
auto IsResumePhi = [](VPUser *U) {
return match(
U, m_VPInstruction<VPInstruction::ResumePhi>(m_VPValue(), m_VPValue()));
};
auto *EpiResumePhiVPI =
cast<VPInstruction>(*find_if(RedResult->users(), IsResumePhi));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler to iterate over recipes R of scalar preheader, which should be ResumePhi's (of epilog), fed by ComputeResult, fed by HeaderPhi, fed by ResumePhi (of main)? Rather than iterating over recipes R of middle-block (of epilog), then find its ResumePhi in scalar preheader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the reduction result should only have a single user, so potentially fewer recipes to check than heckling scalar PH, which may contain multiple ResumePhis. (Also there's no convenient way to retrieve the scalar PH (will be available after #109975)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.
Perhaps, once scalarPH is easily retrievable, would be simpler to find_if RedResult's user having scalarPH as its parent. And then assert it's a ResumePhi.

dyn_cast<PHINode>(PhiR->getStartValue()->getUnderlyingValue());
if (VectorizingEpilogue && RecurrenceDescriptor::isAnyOfRecurrenceKind(
RdxDesc.getRecurrenceKind())) {
PHINode *MainResumePhi;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if Reduction MainResumePhi could be recorded similar to the way Induction resume value of main loop is recorded and retrieved for epilog loop to continue from. Instead of searching for it from epilog through main loop via ComputeResult->HeaderPhi->ResumePhi recipes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think recording separately would be more difficult the corresponding VPInstruction is created during VPlan construction/transformation and the phi node is only available later, via VPTransformState.

Incoming);
else
BCBlockPhi->addIncoming(RdxDesc.getRecurrenceStartValue(), Incoming);
if (is_contained(MainResumePhi->blocks(), Incoming)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expect at-most one such Incoming, worth asserting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Adding last couple of minor nits.

// If we are fixing reductions in the epilogue loop then we should already
// have created a bc.merge.rdx Phi after the main vector body. Ensure that
// we carry over the incoming values correctly.
unsigned UpdateCnt = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned UpdateCnt = 0;
bool Updated = false;

Comment on lines 7614 to 7616
EpiResumePhi->setIncomingValueForBlock(
Incoming, MainResumePhi->getIncomingValueForBlock(Incoming));
UpdateCnt++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EpiResumePhi->setIncomingValueForBlock(
Incoming, MainResumePhi->getIncomingValueForBlock(Incoming));
UpdateCnt++;
assert(!Updated && "Should update at most 1 incoming value");
EpiResumePhi->setIncomingValueForBlock(
Incoming, MainResumePhi->getIncomingValueForBlock(Incoming));
Updated = true;

Comment on lines 7619 to 7620
assert(UpdateCnt <= 1 && "Only should update at most 1 incoming value");
(void)UpdateCnt;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(UpdateCnt <= 1 && "Only should update at most 1 incoming value");
(void)UpdateCnt;
(void)Updated;

"ResumePhi must have a single user");
auto *EpiResumePhiVPI =
cast<VPInstruction>(*find_if(EpiRedResult->users(), IsResumePhi));
auto *EpiResumePhi = cast<PHINode>(State.get(EpiResumePhiVPI, true));
BasicBlock *LoopScalarPreHeader = OrigLoop->getLoopPreheader();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BasicBlock *LoopScalarPreHeader = OrigLoop->getLoopPreheader();
BasicBlock *LoopScalarPreHeader = EpiResumePhi->getParent();

and avoid passing OrigLoop to fixReductionScalarResumeWhenVectorizingEpilog()?

@fhahn fhahn merged commit 0d0abb3 into llvm:main Oct 28, 2024
8 checks passed
@fhahn fhahn deleted the vplan-resumephi-vpinst-for-reductions branch October 28, 2024 19:14
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
Use VPInstruction::ResumePhi to create phi nodes for reduction resume
values in the scalar preheader, similar to how ResumePhis are used for
first-order recurrence resume values after 9a5a873.

This allows simplifying createAndCollectMergePhiForReduction to only
collect reduction resume phis when vectorizing epilogue loops and adding
extra incoming edges from the main vector loop. Updating phis for the
epilogue vector loops requires special attention, because additional
incoming values from the bypass blocks need to be added.

PR: llvm#110004
fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 4, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 10, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 10, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 24, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 7, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 7, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 15, 2024
Model updating IV users directly in VPlan, replace fixupIVUsers.

Depends on llvm#110004,
llvm#109975 and
llvm#112145.
fhahn added a commit that referenced this pull request Dec 29, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on #110004 and
#109975.

PR: #112145
fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 29, 2024
Model updating IV users directly in VPlan, replace fixupIVUsers.

Depends on llvm#110004,
llvm#109975 and
llvm#112145.
sunshaoce pushed a commit to sunshaoce/llvm-public that referenced this pull request Dec 30, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.

PR: llvm#112145
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 5, 2025
Model updating IV users directly in VPlan, replace fixupIVUsers.

Depends on llvm#110004,
llvm#109975 and
llvm#112145.
fhahn added a commit that referenced this pull request Jan 5, 2025
Update optimizeForVFAndUF to completely remove the vector loop region
when possible. At the moment, we cannot remove the region if it contains

* widened IVs: the recipe is needed to generate the step vector
* reductions: ComputeReductionResults requires the reduction phi recipe
for codegen.

Both cases can be addressed by more explicit modeling.

The patch also includes a number of updates to allow executing VPlans
without a vector loop region.

Depends on #110004
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants