-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[VPlan] Build initial VPlan 0 using HCFGBuilder for inner loops. (NFC) #124432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesUse HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes. At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions. This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode. There are multiple follow-ups planned:
Full diff: https://github.com/llvm/llvm-project/pull/124432.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3a4f637f177e19..5d0458f473a78f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8299,7 +8299,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
: GEPNoWrapFlags::none(),
I->getDebugLoc());
}
- Builder.getInsertBlock()->appendRecipe(VectorPtr);
+ VectorPtr->insertBefore(&*Builder.getInsertPoint());
Ptr = VectorPtr;
}
if (LoadInst *Load = dyn_cast<LoadInst>(I))
@@ -9206,6 +9206,7 @@ static void addExitUsersForFirstOrderRecurrences(
VPlanPtr
LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
+ using namespace llvm::VPlanPatternMatch;
SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;
// ---------------------------------------------------------------------------
@@ -9229,6 +9230,10 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop);
+ // Build hierarchical CFG.
+ VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
+ HCFGBuilder.buildHierarchicalCFG();
+
// Don't use getDecisionAndClampRange here, because we don't know the UF
// so this function is better to be conservative, rather than to split
// it up into different VPlans.
@@ -9297,23 +9302,45 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
RecipeBuilder.collectScaledReductions(Range);
auto *MiddleVPBB = Plan->getMiddleBlock();
+ ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
+ Plan->getVectorLoopRegion()->getEntry());
+
VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
- for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
- // Relevant instructions from basic block BB will be grouped into VPRecipe
- // ingredients and fill a new VPBasicBlock.
- if (VPBB != HeaderVPBB)
- VPBB->setName(BB->getName());
- Builder.setInsertPoint(VPBB);
+ VPBlockBase *PrevVPBB = nullptr;
+ for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
+ // Skip VPBBs not corresponding to any input IR basic blocks.
+ if (!HCFGBuilder.getIRBBForVPB(VPBB))
+ continue;
- if (VPBB == HeaderVPBB)
+ // Create mask based on the IR BB corresponding to VPBB.
+ // TODO: Predicate directly based on VPlan.
+ if (VPBB == HeaderVPBB) {
+ Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
RecipeBuilder.createHeaderMask();
- else if (NeedsMasks)
- RecipeBuilder.createBlockInMask(BB);
+ } else if (NeedsMasks) {
+ Builder.setInsertPoint(VPBB, VPBB->begin());
+ RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
+ }
- // Introduce each ingredient into VPlan.
+ // Convert input VPInstructions to widened recipes.
// TODO: Model and preserve debug intrinsics in VPlan.
- for (Instruction &I : drop_end(BB->instructionsWithoutDebug(false))) {
- Instruction *Instr = &I;
+ for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
+ auto *SingleDef = dyn_cast<VPSingleDefRecipe>(&R);
+ if (!isa<VPWidenPHIRecipe>(&R) &&
+ (!isa<VPInstruction>(SingleDef) || !SingleDef->getUnderlyingValue()))
+ continue;
+
+ if (match(&R, m_BranchOnCond(m_VPValue())) ||
+ (isa<VPInstruction>(&R) &&
+ cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {
+ R.eraseFromParent();
+ break;
+ }
+
+ // TODO: Gradually replace uses of underlying instruction by analyses on
+ // VPlan.
+ Instruction *Instr = SingleDef->getUnderlyingInstr();
+ Builder.setInsertPoint(SingleDef);
SmallVector<VPValue *, 4> Operands;
auto *Phi = dyn_cast<PHINode>(Instr);
if (Phi && Phi->getParent() == HeaderBB) {
@@ -9328,15 +9355,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// in the exit block, a uniform store recipe will be created for the final
// invariant store of the reduction.
StoreInst *SI;
- if ((SI = dyn_cast<StoreInst>(&I)) &&
+ if ((SI = dyn_cast<StoreInst>(Instr)) &&
Legal->isInvariantAddressOfReduction(SI->getPointerOperand())) {
// Only create recipe for the final invariant store of the reduction.
- if (!Legal->isInvariantStoreOfReduction(SI))
+ if (!Legal->isInvariantStoreOfReduction(SI)) {
+ R.eraseFromParent();
continue;
+ }
auto *Recipe = new VPReplicateRecipe(
SI, RecipeBuilder.mapToVPValues(Instr->operands()),
true /* IsUniform */);
Recipe->insertBefore(*MiddleVPBB, MBIP);
+ R.eraseFromParent();
continue;
}
@@ -9355,16 +9385,30 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// after them)
// * Optimizing truncates to VPWidenIntOrFpInductionRecipe.
- assert((HeaderVPBB->getFirstNonPhi() == VPBB->end() ||
- CM.foldTailByMasking() || isa<TruncInst>(Instr)) &&
- "unexpected recipe needs moving");
Recipe->insertBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());
} else
- VPBB->appendRecipe(Recipe);
+ Recipe->insertBefore(&R);
+ if (Recipe->getNumDefinedValues() == 1)
+ SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
+ else
+ assert(Recipe->getNumDefinedValues() == 0);
+ R.eraseFromParent();
}
- VPBlockUtils::insertBlockAfter(Plan->createVPBasicBlock(""), VPBB);
- VPBB = cast<VPBasicBlock>(VPBB->getSingleSuccessor());
+ // Flatten the CFG in the loop. Masks for blocks have already been generated
+ // and added to recipes as needed. To do so, first disconnect VPBB from its
+ // predecessors and successors, except the exiting block. Then connect VPBB
+ // to the previously visited VPBB.
+ for (auto *Succ : to_vector(VPBB->getSuccessors())) {
+ if (Succ == Plan->getVectorLoopRegion()->getExiting())
+ continue;
+ VPBlockUtils::disconnectBlocks(VPBB, Succ);
+ }
+ for (auto *Pred : to_vector(VPBB->getPredecessors()))
+ VPBlockUtils::disconnectBlocks(Pred, VPBB);
+ if (PrevVPBB)
+ VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
+ PrevVPBB = VPBB;
}
// After here, VPBB should not be used.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 83c54a9b9c259c..67e3371923a9f2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -579,9 +579,11 @@ static bool hasConditionalTerminator(const VPBasicBlock *VPBB) {
}
const VPRecipeBase *R = &VPBB->back();
- bool IsCondBranch = isa<VPBranchOnMaskRecipe>(R) ||
- match(R, m_BranchOnCond(m_VPValue())) ||
- match(R, m_BranchOnCount(m_VPValue(), m_VPValue()));
+ bool IsCondBranch =
+ isa<VPBranchOnMaskRecipe>(R) || match(R, m_BranchOnCond(m_VPValue())) ||
+ match(R, m_BranchOnCount(m_VPValue(), m_VPValue())) ||
+ (isa<VPInstruction>(R) &&
+ cast<VPInstruction>(R)->getOpcode() == Instruction::Switch);
(void)IsCondBranch;
if (VPBB->getNumSuccessors() >= 2 ||
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
index 5a2e5d7cfee48d..399511ed441b4e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
@@ -75,7 +75,7 @@ class PlainCFGBuilder {
: TheLoop(Lp), LI(LI), Plan(P) {}
/// Build plain CFG for TheLoop and connects it to Plan's entry.
- void buildPlainCFG();
+ void buildPlainCFG(DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB);
};
} // anonymous namespace
@@ -238,9 +238,9 @@ bool PlainCFGBuilder::isExternalDef(Value *Val) {
return false;
// Check whether Instruction definition is in the loop exit.
- BasicBlock *Exit = TheLoop->getUniqueExitBlock();
- assert(Exit && "Expected loop with single exit.");
- if (InstParent == Exit) {
+ SmallVector<BasicBlock *> ExitBlocks;
+ TheLoop->getExitBlocks(ExitBlocks);
+ if (is_contained(ExitBlocks, InstParent)) {
// Instruction definition is in outermost loop exit.
return false;
}
@@ -308,6 +308,14 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
continue;
}
+ if (auto *SI = dyn_cast<SwitchInst>(Inst)) {
+ SmallVector<VPValue *> Ops = {getOrCreateVPOperand(SI->getCondition())};
+ for (auto Case : SI->cases())
+ Ops.push_back(getOrCreateVPOperand(Case.getCaseValue()));
+ VPIRBuilder.createNaryOp(Instruction::Switch, Ops, Inst);
+ continue;
+ }
+
VPValue *NewVPV;
if (auto *Phi = dyn_cast<PHINode>(Inst)) {
// Phi node's operands may have not been visited at this point. We create
@@ -334,7 +342,8 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
}
// Main interface to build the plain CFG.
-void PlainCFGBuilder::buildPlainCFG() {
+void PlainCFGBuilder::buildPlainCFG(
+ DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
// 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
// skeleton. These were created directly rather than via getOrCreateVPBB(),
// revisit them now to update BB2VPBB. Note that header/entry and
@@ -423,6 +432,14 @@ void PlainCFGBuilder::buildPlainCFG() {
// Set VPBB successors. We create empty VPBBs for successors if they don't
// exist already. Recipes will be created when the successor is visited
// during the RPO traversal.
+ if (auto *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {
+ SmallVector<VPBlockBase *> Succs = {
+ getOrCreateVPBB(SI->getDefaultDest())};
+ for (auto Case : SI->cases())
+ Succs.push_back(getOrCreateVPBB(Case.getCaseSuccessor()));
+ VPBB->setSuccessors(Succs);
+ continue;
+ }
auto *BI = cast<BranchInst>(BB->getTerminator());
unsigned NumSuccs = succ_size(BB);
if (NumSuccs == 1) {
@@ -476,11 +493,14 @@ void PlainCFGBuilder::buildPlainCFG() {
// have a VPlan couterpart. Fix VPlan phi nodes by adding their corresponding
// VPlan operands.
fixPhiNodes();
+
+ for (const auto &[IRBB, VPB] : BB2VPBB)
+ VPB2IRBB[VPB] = IRBB;
}
void VPlanHCFGBuilder::buildPlainCFG() {
PlainCFGBuilder PCFGBuilder(TheLoop, LI, Plan);
- PCFGBuilder.buildPlainCFG();
+ PCFGBuilder.buildPlainCFG(VPB2IRBB);
}
// Public interface to build a H-CFG.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
index ad6e2ad90a9610..eac842c00d46a2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
@@ -53,6 +53,10 @@ class VPlanHCFGBuilder {
// are introduced.
VPDominatorTree VPDomTree;
+ /// Map of create VP blocks to their input IR basic blocks, if they have been
+ /// created for a input IR basic block.
+ DenseMap<VPBlockBase *, BasicBlock *> VPB2IRBB;
+
/// Build plain CFG for TheLoop and connects it to Plan's entry.
void buildPlainCFG();
@@ -62,6 +66,12 @@ class VPlanHCFGBuilder {
/// Build H-CFG for TheLoop and update Plan accordingly.
void buildHierarchicalCFG();
+
+ /// Return the input IR BasicBlock corresponding to \p VPB. Returns nullptr if
+ /// there is no such corresponding block.
+ BasicBlock *getIRBBForVPB(const VPBlockBase *VPB) const {
+ return VPB2IRBB.lookup(VPB);
+ }
};
} // namespace llvm
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index f630f4f21e065f..8a32c71ed6ce1c 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -46,7 +46,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
; CHECK-NEXT: LV: Loop does not require scalar epilogue
-; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
+; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK-NEXT: LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
; CHECK-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
; CHECK-NEXT: LV: Scalarizing: %arrayidx3 = getelementptr inbounds i32, ptr %A, i64 %idxprom
@@ -295,7 +295,7 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
; CHECK-NEXT: LV: Loop does not require scalar epilogue
-; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
+; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK-NEXT: LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
; CHECK-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds float, ptr %B, i64 %idxprom
; CHECK-NEXT: LV: Scalarizing: %arrayidx3 = getelementptr inbounds float, ptr %A, i64 %idxprom
|
@llvm/pr-subscribers-vectorizers Author: Florian Hahn (fhahn) ChangesUse HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes. At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions. This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode. There are multiple follow-ups planned:
Full diff: https://github.com/llvm/llvm-project/pull/124432.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3a4f637f177e19..5d0458f473a78f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8299,7 +8299,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
: GEPNoWrapFlags::none(),
I->getDebugLoc());
}
- Builder.getInsertBlock()->appendRecipe(VectorPtr);
+ VectorPtr->insertBefore(&*Builder.getInsertPoint());
Ptr = VectorPtr;
}
if (LoadInst *Load = dyn_cast<LoadInst>(I))
@@ -9206,6 +9206,7 @@ static void addExitUsersForFirstOrderRecurrences(
VPlanPtr
LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
+ using namespace llvm::VPlanPatternMatch;
SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;
// ---------------------------------------------------------------------------
@@ -9229,6 +9230,10 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop);
+ // Build hierarchical CFG.
+ VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
+ HCFGBuilder.buildHierarchicalCFG();
+
// Don't use getDecisionAndClampRange here, because we don't know the UF
// so this function is better to be conservative, rather than to split
// it up into different VPlans.
@@ -9297,23 +9302,45 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
RecipeBuilder.collectScaledReductions(Range);
auto *MiddleVPBB = Plan->getMiddleBlock();
+ ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
+ Plan->getVectorLoopRegion()->getEntry());
+
VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
- for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
- // Relevant instructions from basic block BB will be grouped into VPRecipe
- // ingredients and fill a new VPBasicBlock.
- if (VPBB != HeaderVPBB)
- VPBB->setName(BB->getName());
- Builder.setInsertPoint(VPBB);
+ VPBlockBase *PrevVPBB = nullptr;
+ for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
+ // Skip VPBBs not corresponding to any input IR basic blocks.
+ if (!HCFGBuilder.getIRBBForVPB(VPBB))
+ continue;
- if (VPBB == HeaderVPBB)
+ // Create mask based on the IR BB corresponding to VPBB.
+ // TODO: Predicate directly based on VPlan.
+ if (VPBB == HeaderVPBB) {
+ Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
RecipeBuilder.createHeaderMask();
- else if (NeedsMasks)
- RecipeBuilder.createBlockInMask(BB);
+ } else if (NeedsMasks) {
+ Builder.setInsertPoint(VPBB, VPBB->begin());
+ RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
+ }
- // Introduce each ingredient into VPlan.
+ // Convert input VPInstructions to widened recipes.
// TODO: Model and preserve debug intrinsics in VPlan.
- for (Instruction &I : drop_end(BB->instructionsWithoutDebug(false))) {
- Instruction *Instr = &I;
+ for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
+ auto *SingleDef = dyn_cast<VPSingleDefRecipe>(&R);
+ if (!isa<VPWidenPHIRecipe>(&R) &&
+ (!isa<VPInstruction>(SingleDef) || !SingleDef->getUnderlyingValue()))
+ continue;
+
+ if (match(&R, m_BranchOnCond(m_VPValue())) ||
+ (isa<VPInstruction>(&R) &&
+ cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {
+ R.eraseFromParent();
+ break;
+ }
+
+ // TODO: Gradually replace uses of underlying instruction by analyses on
+ // VPlan.
+ Instruction *Instr = SingleDef->getUnderlyingInstr();
+ Builder.setInsertPoint(SingleDef);
SmallVector<VPValue *, 4> Operands;
auto *Phi = dyn_cast<PHINode>(Instr);
if (Phi && Phi->getParent() == HeaderBB) {
@@ -9328,15 +9355,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// in the exit block, a uniform store recipe will be created for the final
// invariant store of the reduction.
StoreInst *SI;
- if ((SI = dyn_cast<StoreInst>(&I)) &&
+ if ((SI = dyn_cast<StoreInst>(Instr)) &&
Legal->isInvariantAddressOfReduction(SI->getPointerOperand())) {
// Only create recipe for the final invariant store of the reduction.
- if (!Legal->isInvariantStoreOfReduction(SI))
+ if (!Legal->isInvariantStoreOfReduction(SI)) {
+ R.eraseFromParent();
continue;
+ }
auto *Recipe = new VPReplicateRecipe(
SI, RecipeBuilder.mapToVPValues(Instr->operands()),
true /* IsUniform */);
Recipe->insertBefore(*MiddleVPBB, MBIP);
+ R.eraseFromParent();
continue;
}
@@ -9355,16 +9385,30 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// after them)
// * Optimizing truncates to VPWidenIntOrFpInductionRecipe.
- assert((HeaderVPBB->getFirstNonPhi() == VPBB->end() ||
- CM.foldTailByMasking() || isa<TruncInst>(Instr)) &&
- "unexpected recipe needs moving");
Recipe->insertBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());
} else
- VPBB->appendRecipe(Recipe);
+ Recipe->insertBefore(&R);
+ if (Recipe->getNumDefinedValues() == 1)
+ SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
+ else
+ assert(Recipe->getNumDefinedValues() == 0);
+ R.eraseFromParent();
}
- VPBlockUtils::insertBlockAfter(Plan->createVPBasicBlock(""), VPBB);
- VPBB = cast<VPBasicBlock>(VPBB->getSingleSuccessor());
+ // Flatten the CFG in the loop. Masks for blocks have already been generated
+ // and added to recipes as needed. To do so, first disconnect VPBB from its
+ // predecessors and successors, except the exiting block. Then connect VPBB
+ // to the previously visited VPBB.
+ for (auto *Succ : to_vector(VPBB->getSuccessors())) {
+ if (Succ == Plan->getVectorLoopRegion()->getExiting())
+ continue;
+ VPBlockUtils::disconnectBlocks(VPBB, Succ);
+ }
+ for (auto *Pred : to_vector(VPBB->getPredecessors()))
+ VPBlockUtils::disconnectBlocks(Pred, VPBB);
+ if (PrevVPBB)
+ VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
+ PrevVPBB = VPBB;
}
// After here, VPBB should not be used.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 83c54a9b9c259c..67e3371923a9f2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -579,9 +579,11 @@ static bool hasConditionalTerminator(const VPBasicBlock *VPBB) {
}
const VPRecipeBase *R = &VPBB->back();
- bool IsCondBranch = isa<VPBranchOnMaskRecipe>(R) ||
- match(R, m_BranchOnCond(m_VPValue())) ||
- match(R, m_BranchOnCount(m_VPValue(), m_VPValue()));
+ bool IsCondBranch =
+ isa<VPBranchOnMaskRecipe>(R) || match(R, m_BranchOnCond(m_VPValue())) ||
+ match(R, m_BranchOnCount(m_VPValue(), m_VPValue())) ||
+ (isa<VPInstruction>(R) &&
+ cast<VPInstruction>(R)->getOpcode() == Instruction::Switch);
(void)IsCondBranch;
if (VPBB->getNumSuccessors() >= 2 ||
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
index 5a2e5d7cfee48d..399511ed441b4e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
@@ -75,7 +75,7 @@ class PlainCFGBuilder {
: TheLoop(Lp), LI(LI), Plan(P) {}
/// Build plain CFG for TheLoop and connects it to Plan's entry.
- void buildPlainCFG();
+ void buildPlainCFG(DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB);
};
} // anonymous namespace
@@ -238,9 +238,9 @@ bool PlainCFGBuilder::isExternalDef(Value *Val) {
return false;
// Check whether Instruction definition is in the loop exit.
- BasicBlock *Exit = TheLoop->getUniqueExitBlock();
- assert(Exit && "Expected loop with single exit.");
- if (InstParent == Exit) {
+ SmallVector<BasicBlock *> ExitBlocks;
+ TheLoop->getExitBlocks(ExitBlocks);
+ if (is_contained(ExitBlocks, InstParent)) {
// Instruction definition is in outermost loop exit.
return false;
}
@@ -308,6 +308,14 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
continue;
}
+ if (auto *SI = dyn_cast<SwitchInst>(Inst)) {
+ SmallVector<VPValue *> Ops = {getOrCreateVPOperand(SI->getCondition())};
+ for (auto Case : SI->cases())
+ Ops.push_back(getOrCreateVPOperand(Case.getCaseValue()));
+ VPIRBuilder.createNaryOp(Instruction::Switch, Ops, Inst);
+ continue;
+ }
+
VPValue *NewVPV;
if (auto *Phi = dyn_cast<PHINode>(Inst)) {
// Phi node's operands may have not been visited at this point. We create
@@ -334,7 +342,8 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
}
// Main interface to build the plain CFG.
-void PlainCFGBuilder::buildPlainCFG() {
+void PlainCFGBuilder::buildPlainCFG(
+ DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
// 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
// skeleton. These were created directly rather than via getOrCreateVPBB(),
// revisit them now to update BB2VPBB. Note that header/entry and
@@ -423,6 +432,14 @@ void PlainCFGBuilder::buildPlainCFG() {
// Set VPBB successors. We create empty VPBBs for successors if they don't
// exist already. Recipes will be created when the successor is visited
// during the RPO traversal.
+ if (auto *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {
+ SmallVector<VPBlockBase *> Succs = {
+ getOrCreateVPBB(SI->getDefaultDest())};
+ for (auto Case : SI->cases())
+ Succs.push_back(getOrCreateVPBB(Case.getCaseSuccessor()));
+ VPBB->setSuccessors(Succs);
+ continue;
+ }
auto *BI = cast<BranchInst>(BB->getTerminator());
unsigned NumSuccs = succ_size(BB);
if (NumSuccs == 1) {
@@ -476,11 +493,14 @@ void PlainCFGBuilder::buildPlainCFG() {
// have a VPlan couterpart. Fix VPlan phi nodes by adding their corresponding
// VPlan operands.
fixPhiNodes();
+
+ for (const auto &[IRBB, VPB] : BB2VPBB)
+ VPB2IRBB[VPB] = IRBB;
}
void VPlanHCFGBuilder::buildPlainCFG() {
PlainCFGBuilder PCFGBuilder(TheLoop, LI, Plan);
- PCFGBuilder.buildPlainCFG();
+ PCFGBuilder.buildPlainCFG(VPB2IRBB);
}
// Public interface to build a H-CFG.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
index ad6e2ad90a9610..eac842c00d46a2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
@@ -53,6 +53,10 @@ class VPlanHCFGBuilder {
// are introduced.
VPDominatorTree VPDomTree;
+ /// Map of create VP blocks to their input IR basic blocks, if they have been
+ /// created for a input IR basic block.
+ DenseMap<VPBlockBase *, BasicBlock *> VPB2IRBB;
+
/// Build plain CFG for TheLoop and connects it to Plan's entry.
void buildPlainCFG();
@@ -62,6 +66,12 @@ class VPlanHCFGBuilder {
/// Build H-CFG for TheLoop and update Plan accordingly.
void buildHierarchicalCFG();
+
+ /// Return the input IR BasicBlock corresponding to \p VPB. Returns nullptr if
+ /// there is no such corresponding block.
+ BasicBlock *getIRBBForVPB(const VPBlockBase *VPB) const {
+ return VPB2IRBB.lookup(VPB);
+ }
};
} // namespace llvm
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index f630f4f21e065f..8a32c71ed6ce1c 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -46,7 +46,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
; CHECK-NEXT: LV: Loop does not require scalar epilogue
-; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
+; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK-NEXT: LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
; CHECK-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
; CHECK-NEXT: LV: Scalarizing: %arrayidx3 = getelementptr inbounds i32, ptr %A, i64 %idxprom
@@ -295,7 +295,7 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
; CHECK-NEXT: LV: Loop does not require scalar epilogue
-; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
+; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK-NEXT: LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
; CHECK-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds float, ptr %B, i64 %idxprom
; CHECK-NEXT: LV: Scalarizing: %arrayidx3 = getelementptr inbounds float, ptr %A, i64 %idxprom
|
Use HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes. At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions. This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode. There are multiple follow-ups planned: * Use VPIRInstructions instead of VPInstructions in HCFGBuilder, * Perform predication on the VPlan directly, * Unify code constructing VPlan 0 to be shared by both inner and outer loop code paths.
ping :) |
This is a very nice step forward!
|
@@ -8299,7 +8299,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands, | |||
: GEPNoWrapFlags::none(), | |||
I->getDebugLoc()); | |||
} | |||
Builder.getInsertBlock()->appendRecipe(VectorPtr); | |||
VectorPtr->insertBefore(&*Builder.getInsertPoint()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems awkward, would be better to have VPBuilder support
VectorPtr->insertBefore(&*Builder.getInsertPoint()); | |
Builder.insert(VectorPtr); |
as inspired by IRBuilderBase?
@@ -8299,7 +8299,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands, | |||
: GEPNoWrapFlags::none(), | |||
I->getDebugLoc()); | |||
} | |||
Builder.getInsertBlock()->appendRecipe(VectorPtr); | |||
VectorPtr->insertBefore(&*Builder.getInsertPoint()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert at insert point rather than at the end of the insert block - is this change in/dependent of rest of patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split off to f8fa931, thanks
@@ -46,7 +46,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur | |||
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0 | |||
; CHECK-NEXT: LV: Using user VF vscale x 4. | |||
; CHECK-NEXT: LV: Loop does not require scalar epilogue | |||
; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1 | |||
; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why/Is this line no longer NEXT? Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now also include printing the VPlan after HCFG construction, should add a test to check that or drop the extra output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, whatever you prefer.
// Build hierarchical CFG. | ||
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan); | ||
HCFGBuilder.buildHierarchicalCFG(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps VPlanHCFGBuilder::buildingHierarchicalCFG()
which creates initial VPBB's inside loop region ( via PlainCFGBuilder::buildPlainCFG()
) should now be combined with VPlan::createInitialVPlan()
which creates initial VPBB's elsewhere?
Can be done as follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, maybe best to merge as follow-up
VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi(); | ||
for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also remove
LoopBlocksDFS DFS(OrigLoop);
DFS.perform(LI);
above which become dead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
@@ -238,9 +238,9 @@ bool PlainCFGBuilder::isExternalDef(Value *Val) { | |||
return false; | |||
|
|||
// Check whether Instruction definition is in the loop exit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Check whether Instruction definition is in the loop exit. | |
// Check whether Instruction definition is in a loop exit. |
?
Is this notion of "External Def" still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. I think so as
if (auto *SI = dyn_cast<SwitchInst>(Inst)) { | ||
SmallVector<VPValue *> Ops = {getOrCreateVPOperand(SI->getCondition())}; | ||
for (auto Case : SI->cases()) | ||
Ops.push_back(getOrCreateVPOperand(Case.getCaseValue())); | ||
VPIRBuilder.createNaryOp(Instruction::Switch, Ops, Inst); | ||
continue; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Teaching PlainCFGBuilder about switch statements - does this imply that "native" can now vectorize outerloops with switches, and worth testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The native path still won't vectorize switches, due to checks in legality I think. I can add a test case, thanks
(isa<VPInstruction>(&R) && | ||
cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can match via some m_Switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately at the moment the matchers require a fixed number of operands; need to think about how to match without a fixed number of ops.
if (Recipe->getNumDefinedValues() == 1) | ||
SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue()); | ||
else | ||
assert(Recipe->getNumDefinedValues() == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(Recipe->getNumDefinedValues() == 0); | |
assert(Recipe->getNumDefinedValues() == 0 && "Unexpected multidef recipe"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added thanks
SmallVector<BasicBlock *> ExitBlocks; | ||
TheLoop->getExitBlocks(ExitBlocks); | ||
if (is_contained(ExitBlocks, InstParent)) { | ||
// Instruction definition is in outermost loop exit. | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer expecting a loop with single exit?
Teaching PlainCFGBuilder about multiple exits - does this imply that "native" can now vectorize such outerloops, and worth testing?
OTOH, PlainCFGBuilder::buildPlainCFG()
does handle multiple/non-unique exits(?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed for the inner loop multi-exit support. The native path still won't vectorize early exits, due to checks in legality I think. I can add a test case, thanks
Replacing a recipe with a live-in may not be correct in all cases, e.g. when replacing recipes involving header-phi recipes, like reductions. For now, only use SCEV to simplify live-ins. More powerful input simplification can be built in top of #124432 in the future. Fixes #119173. Fixes #125374. PR: #125436
Replacing a recipe with a live-in may not be correct in all cases, e.g. when replacing recipes involving header-phi recipes, like reductions. For now, only use SCEV to simplify live-ins. More powerful input simplification can be built in top of llvm/llvm-project#124432 in the future. Fixes llvm/llvm-project#119173. Fixes llvm/llvm-project#125374. PR: llvm/llvm-project#125436
Split off from #124432 as suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.
…ointer (NFC). Split off from llvm/llvm-project#124432 as suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.
Builder.setInsertPoint(VPBB); | ||
VPBlockBase *PrevVPBB = nullptr; | ||
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) { | ||
// Skip VPBBs not corresponding to any input IR basic blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Skip VPBBs not corresponding to any input IR basic blocks. | |
// Handle VPBBs down to the latch. |
Latch also conceptually has a corresponding IRBB, with underlying Instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks. Initially the latch in the IR is not the latch/exiting block in the VPlan; the exiting block is created as part of the skeleton.
|
||
// Convert input VPInstructions to widened recipes. | ||
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) { | ||
auto *SingleDef = cast<VPSingleDefRecipe>(&R); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *SingleDef = cast<VPSingleDefRecipe>(&R); | |
auto *SingleDef = cast<VPSingleDefRecipe>(&R); | |
auto *UnderlyingValue = SingleDef->getUnderlyingValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
auto *SingleDef = cast<VPSingleDefRecipe>(&R); | ||
// Skip recipes that do not need transforming, including canonical IV, | ||
// wide canonical IV and VPInstructions without underlying values. The | ||
// latter is added by masking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// latter is added by masking. | |
// latter are added above for masking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
if (isa<VPCanonicalIVPHIRecipe>(SingleDef) || | ||
isa<VPWidenCanonicalIVRecipe>(SingleDef) || | ||
(isa<VPInstruction>(&R) && !SingleDef->getUnderlyingValue())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consistency
if (isa<VPCanonicalIVPHIRecipe>(SingleDef) || | |
isa<VPWidenCanonicalIVRecipe>(SingleDef) || | |
(isa<VPInstruction>(&R) && !SingleDef->getUnderlyingValue())) | |
if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(&R) || | |
(isa<VPInstruction>(&R) && !UnderlyingValue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ,thanks
assert(isa<VPWidenPHIRecipe>(&R) || (isa<VPInstruction>(SingleDef) && | ||
SingleDef->getUnderlyingValue()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consistency
assert(isa<VPWidenPHIRecipe>(&R) || (isa<VPInstruction>(SingleDef) && | |
SingleDef->getUnderlyingValue()) && | |
assert(isa<VPWidenPHIRecipe, VPInstruction>(&R) && UnderlyingValue && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why but VPWidenPHIRecipe, VPInstruction
cannot be used in a single isa<>
here. Will need to look deeper what's going on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra parens around the isa, since assert is a macro?
(void)IsCondBranch; | ||
(void)IsSwitch; | ||
|
||
if (VPBB->getNumSuccessors() >= 2 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can check >2 case separately expecting IsSwitch only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
|
||
VPBlockUtils::insertBlockAfter(Plan->createVPBasicBlock(""), VPBB); | ||
VPBB = cast<VPBasicBlock>(VPBB->getSingleSuccessor()); | ||
Builder.insert(Recipe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Builder.insert(Recipe); | |
Builder.insert(Recipe); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
CM.foldTailByMasking() || isa<TruncInst>(Instr)) && | ||
"unexpected recipe needs moving"); | ||
if (isa<VPWidenIntOrFpInductionRecipe>(Recipe) && isa<TruncInst>(Instr)) { | ||
// Optimized a truncate truncates to VPWidenIntOrFpInductionRecipe Move |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) { | ||
auto *SingleDef = cast<VPSingleDefRecipe>(&R); | ||
// Skip recipes that do not need transforming, including canonical IV, | ||
// wide canonical IV and VPInstructions without underlying values. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a FIXME to stop relying on the underlying values of VPInstructions.
The "buildLoop" VPlan0 should use VPInstructions with flags and/or other recipes to best model a copy of the original scalar loop, in contrast to VPIRInstructions which model the original scalar loop itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
isa<VPWidenCanonicalIVRecipe>(SingleDef) || | ||
(isa<VPInstruction>(&R) && !SingleDef->getUnderlyingValue())) | ||
continue; | ||
assert(isa<VPWidenPHIRecipe>(&R) || (isa<VPInstruction>(SingleDef) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a FIXME to stop using VPWidenPHIRecipe in VPlan0, which models (a copy of) the original scalar loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Replacing a recipe with a live-in may not be correct in all cases, e.g. when replacing recipes involving header-phi recipes, like reductions. For now, only use SCEV to simplify live-ins. More powerful input simplification can be built in top of llvm/llvm-project#124432 in the future. Fixes llvm/llvm-project#119173. Fixes llvm/llvm-project#125374. PR: llvm/llvm-project#125436 (cherry picked from commit 30f3752)
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good step forward along roadmap, adding several comments and suggestions.
if (VPBB == HeaderVPBB) | ||
// Create mask based on the IR BB corresponding to VPBB. | ||
// TODO: Predicate directly based on VPlan. | ||
Builder.setInsertPoint(VPBB, VPBB->begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So non-phi recipes that compute a block's mask must be introduced before the phi recipes of the block, because the latter is eventually replaced by blends that use the former? This calls for a FIXME note.
One alternative may be to sink the blends when they replace the phis.
Another is to introduce masking recipes along with replacing phi recipes with blends, as done currently in tryToBuildVPlanWithRecipes().
RecipeBuilder.createBlockInMask(BB); | ||
} else if (NeedsMasks) { | ||
Builder.setInsertPoint(VPBB, VPBB->begin()); | ||
RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's go with HCFGBuilder.getIRBBForVPB(), for now ...
Following our roadmap, HCFGBuilder would be the VPlan-to-VPlan transformation producing "buildLoop" from "wrapInput", and tryToBuildVPlanWithRecipes would follow as a subsequent VPlan-to-VPlan transformation(s). Obtaining this VPBB-to-IRBB mapping from HCFGBuilder should be removed in one of two ways (or more):
- If the buildLoop VPlan is to maintain its correspondence with IR basic blocks then a new type of VPBB can be introduced - one providing getIRBasicBlock() similar to VPIRBasicBlock but having an execute() similar to VPBasicBlock - that generates new basic blocks, or deemed abstract/unreachable - to be materialized into a concreate VPBB prior to codegen.
- If, OTOH, getIRBBForVPB() is meant only as a temporary solution to support createBlockInMask() until the latter is upgraded to work directly on VPBB's, then a FIXME should be added.
(Recipes are created for internal conditional branches but not for unconditional branches (including early exits) - which imply a single successor. An empty block also implies a single predecessor due to lack of phi recipes. Figuring out the BasicBlock from the single predecessor and/or successor of an empty VPBB seems cumbersome; getIRBBForVPB() could record the mapping for empty VPBB's only; simplest to probably to record all VPBB's in loop, for now.)
RecipeBuilder.createHeaderMask(); | ||
else if (NeedsMasks) | ||
RecipeBuilder.createBlockInMask(BB); | ||
} else if (NeedsMasks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the header mask also be created only if NeedsMasks? (Seems independent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently createHeaderMask
itself checks if tail folding is enabled, and only then creates the mask
auto *UnderlyingValue = SingleDef->getUnderlyingValue(); | ||
// Skip recipes that do not need transforming, including canonical IV, | ||
// wide canonical IV and VPInstructions without underlying values. The | ||
// latter are added above by masking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// latter are added above by masking. | |
// latter are added above for masking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(SingleDef) || | ||
(isa<VPInstruction>(&R) && !UnderlyingValue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it important for some isa's to use SingleDef and others &R?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, updated to use &R for consistency and more compact, thanks
if (match(&R, m_BranchOnCond(m_VPValue())) || | ||
(isa<VPInstruction>(&R) && | ||
cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then perhaps more consistent / simpler to do:
if (match(&R, m_BranchOnCond(m_VPValue())) || | |
(isa<VPInstruction>(&R) && | |
cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) { | |
if (isa<VPInstruction>(&R) && | |
(cast<VPInstruction>(&R)->getOpcode() == VPInstruction::BranchOnCond || | |
(cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) { |
(once VPWidenPHIRecipe is turned into a VPInstruction this would be simpler)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
|
||
// Flatten the CFG in the loop. Masks for blocks have already been generated | ||
// and added to recipes as needed. To do so, first disconnect VPBB from its | ||
// successors. Then connect VPBB to the previously visited$ VPBB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// successors. Then connect VPBB to the previously visited$ VPBB. | |
// successors. Then connect VPBB to the previously visited VPBB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, thanks
@@ -46,7 +46,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur | |||
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0 | |||
; CHECK-NEXT: LV: Using user VF vscale x 4. | |||
; CHECK-NEXT: LV: Loop does not require scalar epilogue | |||
; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1 | |||
; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, whatever you prefer.
assert(!HCFGBuilder.getIRBBForVPB(VPBB) && | ||
"the latch block shouldn't have a corresponding IRBB"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... We always have a latch VPBB, which isn't associated with an IRBB, which is what this covered
Initially the latch in the IR is not the latch/exiting block in the VPlan; the exiting block is created as part of the skeleton.
Thanks for clarifying. Should this discrepancy be fixed, perhaps as part of integrating HCFGBuilder with skeleton creation, to form "buildLoop" VPlan whose loop VPBB's are in full 1-1 correspondence with the original IRBB's of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, currently they are quite intertwined already, but once we use it in the inner loop path I'll continue to further refactor/improve on the current structure
Replacing a recipe with a live-in may not be correct in all cases, e.g. when replacing recipes involving header-phi recipes, like reductions. For now, only use SCEV to simplify live-ins. More powerful input simplification can be built in top of llvm#124432 in the future. Fixes llvm#119173. Fixes llvm#125374. PR: llvm#125436
Split off from llvm#124432 as suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.
Replacing a recipe with a live-in may not be correct in all cases, e.g. when replacing recipes involving header-phi recipes, like reductions. For now, only use SCEV to simplify live-ins. More powerful input simplification can be built in top of llvm#124432 in the future. Fixes llvm#119173. Fixes llvm#125374. PR: llvm#125436 (cherry picked from commit 30f3752)
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/12949 Here is the relevant piece of the build log for the reference
|
…loops. (NFC) (#124432) Use HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes. At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions. This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode. There are multiple follow-ups planned: * Perform predication on the VPlan directly, * Unify code constructing VPlan 0 to be shared by both inner and outer loop code paths. * Construct VPlan 0 once, clone subsequent ones for VFs PR: llvm/llvm-project#124432
llvm#124432) Use HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes. At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions. This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode. There are multiple follow-ups planned: * Perform predication on the VPlan directly, * Unify code constructing VPlan 0 to be shared by both inner and outer loop code paths. * Construct VPlan 0 once, clone subsequent ones for VFs PR: llvm#124432
I'm getting an assertion failure after this commit when using llvm on Linux and Windows. I can create a reproducer if that's useful.
|
Hi @fhahn |
@mikaelholmen it should be fixed in a96444a. The assertion was over-eager and the related code actually unused. @Zentrik it would be great if you could check if the issue is fixed. If not, would you be able to share a reproducer. |
Yep, thanks! |
@fhahn: Hm, seems like I was too quick there with my verification :/
|
@mikaelholmen the second issue should be fixed by 404af37 |
Yep now it works. Thanks! |
Use operands from VPInstructions directly during recipe creation. Follow-up as discussed and planned after #124432.
…(NFC). Use operands from VPInstructions directly during recipe creation. Follow-up as discussed and planned after llvm/llvm-project#124432.
Removes unneeded code after #124432.
Removes unneeded code after llvm/llvm-project#124432.
Use HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes.
At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions.
This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode.
There are multiple follow-ups planned: