Skip to content

[VPlan] Retain exit conditions and edges in initial VPlan (NFC). #137709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 8, 2025

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Apr 28, 2025

Update initial VPlan construction to include exit conditions and edges.

The loop region is now first constructed without entry/exiting. Those are set after inserting the region in the CFG, to preserve the original predecessor/successor order of blocks.

For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (#128420).

@llvmbot
Copy link
Member

llvmbot commented Apr 28, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

Update initial VPlan construction to include exit conditions and edges.

For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (#128420).

This includes updates in a few places to use
replaceSuccessor/replacePredecessor to preserve the order of predecessors and successors, to reduce the need of fixing up phi operand orderings. This unfortunately required making them public, not sure if there's a


Full diff: https://github.com/llvm/llvm-project/pull/137709.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+11-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+1-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp (+38-40)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing-outer-loop.ll (+5-1)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 4684378687ef6..bd043ae5e06e8 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9167,6 +9167,7 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
 // loop.
 static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,
                                   DebugLoc DL) {
+  using namespace VPlanPatternMatch;
   Value *StartIdx = ConstantInt::get(IdxTy, 0);
   auto *StartV = Plan.getOrAddLiveIn(StartIdx);
 
@@ -9176,7 +9177,16 @@ static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,
   VPBasicBlock *Header = TopRegion->getEntryBasicBlock();
   Header->insert(CanonicalIVPHI, Header->begin());
 
-  VPBuilder Builder(TopRegion->getExitingBasicBlock());
+  VPBasicBlock *LatchVPBB = TopRegion->getExitingBasicBlock();
+  // We are about to replace the branch to exit the region. Remove the original
+  // BranchOnCond, if there is any.
+  // TODO: Move canonical IV and BranchOnCount introduction to initial skeleton
+  // creation.
+  if (!LatchVPBB->empty() &&
+      match(&LatchVPBB->back(), m_BranchOnCond(m_VPValue())))
+    LatchVPBB->getTerminator()->eraseFromParent();
+
+  VPBuilder Builder(LatchVPBB);
   // Add a VPInstruction to increment the scalar canonical IV by VF * UF.
   auto *CanonicalIVIncrement = Builder.createOverflowingOp(
       Instruction::Add, {CanonicalIVPHI, &Plan.getVFxUF()}, {HasNUW, false}, DL,
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index afad73bcd3501..5d81bcdc4a21e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -117,6 +117,7 @@ class VPBlockBase {
     Predecessors.erase(Pos);
   }
 
+public:
   /// Remove \p Successor from the successors of this block.
   void removeSuccessor(VPBlockBase *Successor) {
     auto Pos = find(Successors, Successor);
@@ -129,8 +130,6 @@ class VPBlockBase {
   void replacePredecessor(VPBlockBase *Old, VPBlockBase *New) {
     auto I = find(Predecessors, Old);
     assert(I != Predecessors.end());
-    assert(Old->getParent() == New->getParent() &&
-           "replaced predecessor must have the same parent");
     *I = New;
   }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 58d63933fb724..af198bbf57df6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -112,6 +112,9 @@ VPBasicBlock *PlainCFGBuilder::getOrCreateVPBB(BasicBlock *BB) {
     return VPBB;
   }
 
+  if (!TheLoop->contains(BB))
+    return Plan->getExitBlock(BB);
+
   // Create new VPBB.
   StringRef Name = BB->getName();
   LLVM_DEBUG(dbgs() << "Creating VPBasicBlock for " << Name << "\n");
@@ -145,14 +148,6 @@ bool PlainCFGBuilder::isExternalDef(Value *Val) {
     // Instruction definition is in outermost loop PH.
     return false;
 
-  // Check whether Instruction definition is in a loop exit.
-  SmallVector<BasicBlock *> ExitBlocks;
-  TheLoop->getExitBlocks(ExitBlocks);
-  if (is_contained(ExitBlocks, InstParent)) {
-    // Instruction definition is in outermost loop exit.
-    return false;
-  }
-
   // Check whether Instruction definition is in loop body.
   return !TheLoop->contains(Inst);
 }
@@ -201,11 +196,6 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
            "Instruction shouldn't have been visited.");
 
     if (auto *Br = dyn_cast<BranchInst>(Inst)) {
-      if (TheLoop->getLoopLatch() == BB ||
-          any_of(successors(BB),
-                 [this](BasicBlock *Succ) { return !TheLoop->contains(Succ); }))
-        continue;
-
       // Conditional branch instruction are represented using BranchOnCond
       // recipes.
       if (Br->isConditional()) {
@@ -295,7 +285,6 @@ std::unique_ptr<VPlan> PlainCFGBuilder::buildPlainCFG(
   for (BasicBlock *BB : RPO) {
     // Create or retrieve the VPBasicBlock for this BB.
     VPBasicBlock *VPBB = getOrCreateVPBB(BB);
-    Loop *LoopForBB = LI->getLoopFor(BB);
     // Set VPBB predecessors in the same order as they are in the incoming BB.
     setVPBBPredsFromBB(VPBB, BB);
 
@@ -326,24 +315,12 @@ std::unique_ptr<VPlan> PlainCFGBuilder::buildPlainCFG(
     BasicBlock *IRSucc1 = BI->getSuccessor(1);
     VPBasicBlock *Successor0 = getOrCreateVPBB(IRSucc0);
     VPBasicBlock *Successor1 = getOrCreateVPBB(IRSucc1);
-
-    // Don't connect any blocks outside the current loop except the latches for
-    // inner loops.
-    // TODO: Also connect exit blocks during initial VPlan construction.
-    if (LoopForBB == TheLoop || BB != LoopForBB->getLoopLatch()) {
-      if (!LoopForBB->contains(IRSucc0)) {
-        VPBB->setOneSuccessor(Successor1);
-        continue;
-      }
-      if (!LoopForBB->contains(IRSucc1)) {
-        VPBB->setOneSuccessor(Successor0);
-        continue;
-      }
-    }
-
     VPBB->setTwoSuccessors(Successor0, Successor1);
   }
 
+  for (auto *EB : Plan->getExitBlocks())
+    setVPBBPredsFromBB(EB, EB->getIRBasicBlock());
+
   // 2. The whole CFG has been built at this point so all the input Values must
   // have a VPlan counterpart. Fix VPlan header phi by adding their
   // corresponding VPlan operands.
@@ -446,19 +423,21 @@ static void createLoopRegion(VPlan &Plan, VPBlockBase *HeaderVPB) {
   VPBlockBase *Succ = LatchVPBB->getSingleSuccessor();
   assert(LatchVPBB->getNumSuccessors() <= 1 &&
          "Latch has more than one successor");
-  if (Succ)
-    VPBlockUtils::disconnectBlocks(LatchVPBB, Succ);
+  LatchVPBB->removeSuccessor(Succ);
 
   auto *R = Plan.createVPRegionBlock(HeaderVPB, LatchVPBB, "",
                                      false /*isReplicator*/);
   // All VPBB's reachable shallowly from HeaderVPB belong to top level loop,
   // because VPlan is expected to end at top level latch disconnected above.
+  SmallPtrSet<VPBlockBase *, 2> ExitBlocks(Plan.getExitBlocks().begin(),
+                                           Plan.getExitBlocks().end());
   for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPB))
-    VPBB->setParent(R);
+    if (!ExitBlocks.contains(VPBB))
+      VPBB->setParent(R);
 
   VPBlockUtils::insertBlockAfter(R, PreheaderVPBB);
-  if (Succ)
-    VPBlockUtils::connectBlocks(R, Succ);
+  R->setOneSuccessor(Succ);
+  Succ->replacePredecessor(LatchVPBB, R);
 }
 
 void VPlanTransforms::prepareForVectorization(VPlan &Plan, Type *InductionTy,
@@ -476,8 +455,29 @@ void VPlanTransforms::prepareForVectorization(VPlan &Plan, Type *InductionTy,
   VPBlockUtils::insertBlockAfter(VecPreheader, Plan.getEntry());
 
   VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
-  VPBlockUtils::connectBlocks(LatchVPB, MiddleVPBB);
-  LatchVPB->swapSuccessors();
+  VPBlockBase *LatchExitVPB = LatchVPB->getNumSuccessors() == 2
+                                  ? LatchVPB->getSuccessors()[0]
+                                  : nullptr;
+  if (LatchExitVPB) {
+    LatchVPB->getSuccessors()[0] = MiddleVPBB;
+    MiddleVPBB->setPredecessors({LatchVPB});
+    MiddleVPBB->setSuccessors({LatchExitVPB});
+    LatchExitVPB->replacePredecessor(LatchVPB, MiddleVPBB);
+  } else {
+    VPBlockUtils::connectBlocks(LatchVPB, MiddleVPBB);
+    LatchVPB->swapSuccessors();
+  }
+
+  // Disconnect all edges between exit blocks other than from the latch.
+  // TODO: Uncountable exit blocks should be handled here.
+  for (VPBlockBase *EB : to_vector(Plan.getExitBlocks())) {
+    for (VPBlockBase *Pred : to_vector(EB->getPredecessors())) {
+      if (Pred == MiddleVPBB)
+        continue;
+      cast<VPBasicBlock>(Pred)->getTerminator()->eraseFromParent();
+      VPBlockUtils::disconnectBlocks(Pred, EB);
+    }
+  }
 
   // Create SCEV and VPValue for the trip count.
   // We use the symbolic max backedge-taken-count, which works also when
@@ -503,8 +503,9 @@ void VPlanTransforms::prepareForVectorization(VPlan &Plan, Type *InductionTy,
   //    Thus if tail is to be folded, we know we don't need to run the
   //    remainder and we can set the condition to true.
   // 3) Otherwise, construct a runtime check.
-
   if (!RequiresScalarEpilogueCheck) {
+    if (LatchExitVPB)
+      VPBlockUtils::disconnectBlocks(MiddleVPBB, LatchExitVPB);
     VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
     // The exit blocks are unreachable, remove their recipes to make sure no
     // users remain that may pessimize transforms.
@@ -516,9 +517,6 @@ void VPlanTransforms::prepareForVectorization(VPlan &Plan, Type *InductionTy,
   }
 
   // The connection order corresponds to the operands of the conditional branch.
-  BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
-  auto *VPExitBlock = Plan.getExitBlock(IRExitBlock);
-  VPBlockUtils::connectBlocks(MiddleVPBB, VPExitBlock);
   VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
 
   auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-outer-loop.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-outer-loop.ll
index 91a5ea6b7fe36..fe845ae74cbee 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-printing-outer-loop.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-outer-loop.ll
@@ -31,7 +31,11 @@ define void @foo(i64 %n) {
 ; CHECK-NEXT: outer.latch:
 ; CHECK-NEXT:   EMIT ir<%outer.iv.next> = add ir<%outer.iv>, ir<1>
 ; CHECK-NEXT:   EMIT ir<%outer.ec> = icmp ir<%outer.iv.next>, ir<8>
-; CHECK-NEXT: Successor(s): outer.header
+; CHECK-NEXT:   EMIT branch-on-cond ir<%outer.ec>
+; CHECK-NEXT: Successor(s): ir-bb<exit>, outer.header
+; CHECK-EMPTY:
+; CHECK-NEXT: ir-bb<exit>:
+; CHECK-NEXT: No successors
 ; CHECK-NEXT: }
 entry:
   br label %outer.header

@fhahn fhahn force-pushed the vplan-retain-exits branch from c368b5d to e347f11 Compare May 3, 2025 12:08
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 3, 2025
Move early-exit handling up front to original VPlan construction, before
introducing early exits.

This builds on llvm#137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.

This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.

Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.

Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
llvm#128420.

Depends on llvm#137709.
Update initial VPlan construction to include exit conditions and
edges.

For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs (llvm#128420).

This includes updates in a few places to use
replaceSuccessor/replacePredecessor to preserve the order of predecessors
and successors, to reduce the need of fixing up phi operand orderings.
This unfortunately required making them public, not sure if there's a
@fhahn fhahn force-pushed the vplan-retain-exits branch from e347f11 to de0fc2d Compare May 5, 2025 17:10
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 5, 2025
Move early-exit handling up front to original VPlan construction, before
introducing early exits.

This builds on llvm#137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.

This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.

Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.

Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
llvm#128420.

Depends on llvm#137709.
Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raises some thoughts... (trying to address the incomplete "not sure if there's a" ending of the commit message)

Comment on lines 149 to 156
// Check whether Instruction definition is in a loop exit.
SmallVector<BasicBlock *> ExitBlocks;
TheLoop->getExitBlocks(ExitBlocks);
if (is_contained(ExitBlocks, InstParent)) {
// Instruction definition is in outermost loop exit.
return false;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this simply redundant, independently, as such instructions which reside in ExitBlocks are not contained in TheLoop?

Should the above definition of external defs be updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, removed as NFC in 282af2d

for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPB))
VPBB->setParent(R);
if (!ExitBlocks.contains(VPBB))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now reach exit blocks, contrary to above comment, via early exits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, comment updated, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm2, createLoopRegion() is called after prepareForVectorization(), which according to the changes below should have removed all early-exit edges, so is this check (if VPBB is an exit block) needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not reachable in the latest version, removed thanks!


addCanonicalIVRecipes(Plan, cast<VPBasicBlock>(HeaderVPB),
cast<VPBasicBlock>(LatchVPB), InductionTy, IVDL);

// Disconnect all edges between exit blocks other than from the latch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Disconnect all edges between exit blocks other than from the latch.
// Disconnect all edges to exit blocks other than from the middle block.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

Comment on lines 116 to 118
if (!TheLoop->contains(BB))
return Plan->getExitBlock(BB);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about populating BB2VPBB with Plan's exit blocks on PlainCFGBuilder's construction, to fold this check with the one above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

Comment on lines 425 to 426
assert(LatchVPBB->getNumSuccessors() <= 1 &&
"Latch has more than one successor");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(LatchVPBB->getNumSuccessors() <= 1 &&
"Latch has more than one successor");
assert(Succ && "Latch expected to be left with a single successor");

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPB))
VPBB->setParent(R);
if (!ExitBlocks.contains(VPBB))
VPBB->setParent(R);

VPBlockUtils::insertBlockAfter(R, PreheaderVPBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PlaceHolder introduced earlier may now be used instead as follows

Suggested change
VPBlockUtils::insertBlockAfter(R, PreheaderVPBB);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Comment on lines 493 to 500
VPBlockBase *LatchExitVPB = LatchVPB->getNumSuccessors() == 2
? LatchVPB->getSuccessors()[0]
: nullptr;
if (LatchExitVPB) {
LatchVPB->getSuccessors()[0] = MiddleVPBB;
MiddleVPBB->setPredecessors({LatchVPB});
MiddleVPBB->setSuccessors({LatchExitVPB});
LatchExitVPB->replacePredecessor(LatchVPB, MiddleVPBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPBlockBase *LatchExitVPB = LatchVPB->getNumSuccessors() == 2
? LatchVPB->getSuccessors()[0]
: nullptr;
if (LatchExitVPB) {
LatchVPB->getSuccessors()[0] = MiddleVPBB;
MiddleVPBB->setPredecessors({LatchVPB});
MiddleVPBB->setSuccessors({LatchExitVPB});
LatchExitVPB->replacePredecessor(LatchVPB, MiddleVPBB);
// Canonical LatchVPB has header block as last successor. If it has another
// successor, the latter is an exit block - insert middle block on its edge.
// Otherwise, add middle block as another successor retaining header as last.
if (LatchVPB->getNumSuccessors() == 2) {
VPBlockBase *LatchExitVPB = LatchVPB->getSuccessors()[0];
VPBlockUtils::insertOnEdge(LatchVPB, LatchExitVPB, MiddleVPBB);

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!


addCanonicalIVRecipes(Plan, cast<VPBasicBlock>(HeaderVPB),
cast<VPBasicBlock>(LatchVPB), InductionTy, IVDL);

// Disconnect all edges between exit blocks other than from the latch.
// TODO: Uncountable exit blocks should be handled here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are "handled", being listed in getExitBlocks; but should be handled differently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, here for now we just simply drop the edges, i.e. leaving the plan in an incorrect/incomplete state, relying on the later transform to adjust the exit condition and introduce an extra middle block, looking up the early exit condition using original IR references.

if (!RequiresScalarEpilogueCheck) {
if (LatchExitVPB)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (LatchExitVPB)
if (auto *LatchExitVPB = MiddleVPBB->getSingleSuccessor())

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks!

@@ -117,6 +117,7 @@ class VPBlockBase {
Predecessors.erase(Pos);
}

public:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this low-level API can remain private, possibly with an addition to the high-level public API, see below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

fhahn added a commit that referenced this pull request May 6, 2025
Remove checking if the instruction is in the preheader or exit blocks.
Those checks are redundant and handled by checking if the instruction is
outside the loop below.

Split off as suggested from #137709.
fhahn added a commit that referenced this pull request May 6, 2025
Flip the region construction order to innermost first from outermost
first. This ensures we only set the final parent region for VPBBs once.

Split off from #137709.
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 6, 2025
Move early-exit handling up front to original VPlan construction, before
introducing early exits.

This builds on llvm#137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.

This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.

Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.

Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
llvm#128420.

Depends on llvm#137709.
Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message needs updating (and completion..)

TODO: Suspect mergeBlocksIntoPredecessors() might lose predecessor positions as it doesn't check if the empty block's successors have a single predecessor (though that may be the case currently; otherwise should probably not be considered atm). Supplying removeOnEdge(Block) which disconnects Block from its single Pred and single Succ, reconnecting them together while retaining pred and succ positions may help, but an empty block being merged into its predecessor may have multiple successors - rather than being on an edge.

any_of(successors(BB),
[this](BasicBlock *Succ) { return !TheLoop->contains(Succ); }))
continue;

// Conditional branch instruction are represented using BranchOnCond
// recipes.
if (Br->isConditional()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems a bit clearer to early continue?

Suggested change
if (Br->isConditional()) {
if (!Br->isConditional())
continue;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, ignore, we continue also after handling a conditional branch.

// All VPBB's reachable shallowly from HeaderVPB belong to top level loop,
// because VPlan is expected to end at top level latch disconnected above.
// All VPBB's reachable shallowly from HeaderVPB belong to the current region,
// except the exit blocks reachable via non-latch exiting blocks,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// except the exit blocks reachable via non-latch exiting blocks,
// except the exit blocks reachable via non-latch exiting blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed thanks!

Comment on lines 431 to 436
VPBlockUtils::insertBlockAfter(R, PreheaderVPBB);
if (Succ)
VPBlockUtils::connectBlocks(R, Succ);
VPBlockUtils::insertOnEdge(PlaceHolder, Succ, R);

// Remove placeholder block.
VPBlockUtils::disconnectBlocks(R, PlaceHolder);
VPBlockUtils::disconnectBlocks(PlaceHolder, R);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserting R first after Preheader (clearer to instead explicitly insert R on the edge from Preheader to PlaceHolder?) and then inserting R again on the edge between PlaceHolder and Succ seems a bit confusing, thereby creating bidirectional edges between R and PlaceHolder which are then removed. It's like inserting R both before PlaceHolder and after it, but both these insertions require R to be disconnected. In essence we want to replace PlaceHolder with R.
Another alternative is to have an empty region as a placeholder rather than an empty basic block, and then move/copy R's entry and exit blocks into it:

  auto *PlaceHolder = Plan.createVPRegionBlock("", false /*isReplicator*/);

followed by

  PlaceHolder.setEntry(R.getEntry());
  PlaceHolder.setExit(R.getExit());

essentially turning R to be the temporary block and PlaceHolder to be R, constructed outside in rather than inside out?

Copy link
Collaborator

@ayalz ayalz May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps suffice to be a bit clearer by slightly reordering (would Exit be a better name for Succ?):

// Have R replace PlaceHolder as successor of Preheader.
BlockUtils::insertOnEdge(PreheaderBlock, PlaceHolder, R);
BlockUtils::disconnectBlocks(R, PlaceHolder);
// Have R replace PlaceHolder as predecessor of Exit.
BlockUtils::insertOnEdge(PlaceHolder, Exit, R);
BlockUtils::disconnectBlocks(PlaceHolder, R);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there's no need to construct a placeholder region, we can simply create an empty main region first and then set entry/exiting after adjusting the CFG. Updated, thanks!

Comment on lines 503 to 505
// TODO: VPlans with early exits should be explicitly converted to a form only
// exiting via the latch here, including adjusting the exit condition, instead
// of simplify disconnecting the edges and adjusting the VPlan later.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: VPlans with early exits should be explicitly converted to a form only
// exiting via the latch here, including adjusting the exit condition, instead
// of simplify disconnecting the edges and adjusting the VPlan later.
// TODO: VPlans with early exits should be explicitly converted to a form
// exiting only via the latch here, including adjusting the exit condition,
// instead of simply disconnecting the edges and adjusting the VPlan later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 7, 2025
…rnalDef (NFC).

Remove checking if the instruction is in the preheader or exit blocks.
Those checks are redundant and handled by checking if the instruction is
outside the loop below.

Split off as suggested from llvm/llvm-project#137709.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 7, 2025
Flip the region construction order to innermost first from outermost
first. This ensures we only set the final parent region for VPBBs once.

Split off from llvm/llvm-project#137709.
@fhahn
Copy link
Contributor Author

fhahn commented May 7, 2025

Commit message needs updating (and completion..)

Thanks, updated now.

TODO: Suspect mergeBlocksIntoPredecessors() might lose predecessor positions as it doesn't check if the empty block's successors have a single predecessor (though that may be the case currently; otherwise should probably not be considered atm). Supplying removeOnEdge(Block) which disconnects Block from its single Pred and single Succ, reconnecting them together while retaining pred and succ positions may help, but an empty block being merged into its predecessor may have multiple successors - rather than being on an edge.

Now there's no need to remove/merge, so I left the TODO out of the commit message for now. It will mangle the predecessor order of the successors, at the moment not used in places where successors could have phi-like recipes I think

@david-arm
Copy link
Contributor

Hi @fhahn, looks like there might be some genuine unit test failures on the Windows build?

VPBlockUtils::connectBlocks(LatchVPB, MiddleVPBB);
LatchVPB->swapSuccessors();
// Canonical LatchVPB has header block as last successor. If it has another
// successor, the latter is an exit block - insert middle block on its edge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm a bit confused by the word "latter" here. Does "latter" refer to "another successor"? Perhaps it's easier just to say that if the latchvpb is not canonical the early exit block(s) come first, with the (canonical?) exit to the middle block being last? If I've understood the layout correctly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LatchVPB is canonical - as a result of calling canonicalHeaderAndLatch() above. Being a canonical latch means LatchVPB has header block as its last successor, and this property is maintained. If LatchVPB has another successor, in addition to header, this other successor (appears first and) is an exit block. In this case middle block is inserted on the edge from LatchVPB to its first exit block successor. Should "the latter" be replaced with "this other successor"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to say

If it has another successor, this successor is an exit block

if (Pred == MiddleVPBB)
continue;
cast<VPBasicBlock>(Pred)->getTerminator()->eraseFromParent();
VPBlockUtils::disconnectBlocks(Pred, EB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit odd at first as it seems to be undoing what you did above with VPBlockUtils::insertOnEdge(LatchVPB, LatchExitVPB, MiddleVPBB);. I presume that's because the vector.early.exit VPBB sits between the latch block and the original IR early exit block?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The edge from MiddleVPBB to LatchExitVPB is retained here explicitly via the early-continue exclusion. The edges removed here are early-exits from non-latch Pred block to early.exit block. Block vector.early.exit is introduced by handleUncountableEarlyExit() which currently takes place later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above with insertOnEdge we only handle the edge exiting from the latch. The exit block via the latch will now be connected to the middle block.

The loop here disconnects all early exits and they will be handled later: either by requiring at least one scalar iteration, nothing more needs to be done, or introducing the early exit control flow to go to the early exit via the additional middle block. For the latter case, the VPlan is now incomplete/incorrect.

To avoid this, we should directly handle the uncountable early exits here, which is done in #138393. This way, we do not need to rely on IR references in handleUncountableEarlyExit and the VPlan remains complete/correct throughout.

// order of blocks. Set region entry and exiting after both HeaderVPB and
// LatchVPBB have been disconnected from their predecessors/successors.
auto *R = Plan.createVPRegionBlock("", false /*isReplicator*/);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change

VPBlockUtils::connectBlocks(LatchVPB, MiddleVPBB);
LatchVPB->swapSuccessors();
// Canonical LatchVPB has header block as last successor. If it has another
// successor, the latter is an exit block - insert middle block on its edge.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LatchVPB is canonical - as a result of calling canonicalHeaderAndLatch() above. Being a canonical latch means LatchVPB has header block as its last successor, and this property is maintained. If LatchVPB has another successor, in addition to header, this other successor (appears first and) is an exit block. In this case middle block is inserted on the edge from LatchVPB to its first exit block successor. Should "the latter" be replaced with "this other successor"?

@fhahn
Copy link
Contributor Author

fhahn commented May 7, 2025

Hi @fhahn, looks like there might be some genuine unit test failures on the Windows build?

A few C++ unit tests still needed updating, should be done now

GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
Remove checking if the instruction is in the preheader or exit blocks.
Those checks are redundant and handled by checking if the instruction is
outside the loop below.

Split off as suggested from llvm#137709.
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
Flip the region construction order to innermost first from outermost
first. This ensures we only set the final parent region for VPBBs once.

Split off from llvm#137709.
// The canonical LatchVPB has the header block as last successor. If it has
// another successor, this successor is an exit block - insert middle block on
// its edge. Otherwise, add middle block as another successor retaining header
// as last.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, prepareForVectorization() is called after buildPlainCFG() (for both native and non-native paths), and with the above changes, buildPlainCFG() connects the latch to both header and exit blocks, i.e., should LatchVPB always have two successors now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can vectorize loops where the latch isn’t exiting, by requiring at least one iteration in the scalar loop. In that case, the original latch will
Have a single successor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, thanks. Perhaps worth a comment.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine w/ me, adding some last final comments, thanks!
@david-arm - is this ok w/ you?

@@ -534,9 +547,6 @@ void VPlanTransforms::prepareForVectorization(VPlan &Plan, Type *InductionTy,
}

// The connection order corresponds to the operands of the conditional branch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"// with middle block already connected to exit block."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added thanks

@@ -302,6 +304,7 @@ compound=true
" EMIT store ir\<%res\>, ir\<%arr.idx\>\l" +
" EMIT ir\<%iv.next\> = add ir\<%iv\>, ir\<1\>\l" +
" EMIT ir\<%exitcond\> = icmp ir\<%iv.next\>, ir\<%N\>\l" +
" EMIT vp\<%3\> = not ir\<%exitcond\>\l" +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should vp%3 be used, also above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, at this point we already added the canonical branch to control the loop; the old conditions will be cleaned up by DCE

// The canonical LatchVPB has the header block as last successor. If it has
// another successor, this successor is an exit block - insert middle block on
// its edge. Otherwise, add middle block as another successor retaining header
// as last.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, thanks. Perhaps worth a comment.

for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPB))
VPBB->setParent(R);
if (!ExitBlocks.contains(VPBB))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm2, createLoopRegion() is called after prepareForVectorization(), which according to the changes below should have removed all early-exit edges, so is this check (if VPBB is an exit block) needed?

@@ -229,6 +230,7 @@ TEST_F(VPlanHCFGTest, testVPInstructionToVPRecipesInner) {
EXPECT_NE(nullptr, dyn_cast<VPInstruction>(&*Iter++));
EXPECT_NE(nullptr, dyn_cast<VPInstruction>(&*Iter++));
EXPECT_NE(nullptr, dyn_cast<VPInstruction>(&*Iter++));
EXPECT_NE(nullptr, dyn_cast<VPInstruction>(&*Iter++));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous case didn't check all instructions (that Iter reached VecBB->end()), so avoid checking the new last instruction, as done here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left as is for now, just carrying the curent behavior forward as this checks that we reach VecBB->end(). Might be something to be done separately? Or checking if we have the expected canonical instructions, which may make the test more brittle.

Comment on lines 426 to 427
SmallPtrSet<VPBlockBase *, 2> ExitBlocks(Plan.getExitBlocks().begin(),
Plan.getExitBlocks().end());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallPtrSet<VPBlockBase *, 2> ExitBlocks(Plan.getExitBlocks().begin(),
Plan.getExitBlocks().end());

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed thanks

Comment on lines 426 to 427
SmallPtrSet<VPBlockBase *, 2> ExitBlocks(Plan.getExitBlocks().begin(),
Plan.getExitBlocks().end());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallPtrSet<VPBlockBase *, 2> ExitBlocks(Plan.getExitBlocks().begin(),
Plan.getExitBlocks().end());

?

@fhahn fhahn merged commit 339dc95 into llvm:main May 8, 2025
8 of 10 checks passed
@fhahn fhahn deleted the vplan-retain-exits branch May 8, 2025 17:16
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 8, 2025
…(NFC). (#137709)

Update initial VPlan construction to include exit conditions and edges.

The loop region is now first constructed without entry/exiting. Those
are set after inserting the region in the CFG, to preserve the original
predecessor/successor order of blocks.

For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs
(llvm/llvm-project#128420).

PR: llvm/llvm-project#137709
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 8, 2025
Move early-exit handling up front to original VPlan construction, before
introducing early exits.

This builds on llvm#137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.

This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.

Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.

Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
llvm#128420.

Depends on llvm#137709.
fhahn added a commit that referenced this pull request May 12, 2025
Move early-exit handling up front to original VPlan construction, before
introducing early exits.

This builds on #137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.

This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.

Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.

Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
#128420.

Depends on #137709 (included in
PR).

PR: #138393
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 12, 2025
…138393)

Move early-exit handling up front to original VPlan construction, before
introducing early exits.

This builds on llvm/llvm-project#137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.

This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.

Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.

Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
llvm/llvm-project#128420.

Depends on llvm/llvm-project#137709 (included in
PR).

PR: llvm/llvm-project#138393
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants