Skip to content

[VPlan] Build initial VPlan 0 using HCFGBuilder for inner loops. (NFC) #124432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 18, 2025

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jan 25, 2025

Use HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes.

At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions.

This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode.

There are multiple follow-ups planned:

  • Use VPIRInstructions instead of VPInstructions in HCFGBuilder,
  • Perform predication on the VPlan directly,
  • Unify code constructing VPlan 0 to be shared by both inner and outer loop code paths.
  • Construct VPlan 0 once, clone subsequent ones for VFs

@llvmbot
Copy link
Member

llvmbot commented Jan 25, 2025

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Use HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes.

At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions.

This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode.

There are multiple follow-ups planned:

  • Use VPIRInstructions instead of VPInstructions in HCFGBuilder,
  • Perform predication on the VPlan directly,
  • Unify code constructing VPlan 0 to be shared by both inner and outer loop code paths.
  • Construct VPlan 0 once, clone subsequent ones for VFs

Full diff: https://github.com/llvm/llvm-project/pull/124432.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+65-21)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+5-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp (+26-6)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h (+10)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3a4f637f177e19..5d0458f473a78f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8299,7 +8299,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                                 : GEPNoWrapFlags::none(),
                                             I->getDebugLoc());
     }
-    Builder.getInsertBlock()->appendRecipe(VectorPtr);
+    VectorPtr->insertBefore(&*Builder.getInsertPoint());
     Ptr = VectorPtr;
   }
   if (LoadInst *Load = dyn_cast<LoadInst>(I))
@@ -9206,6 +9206,7 @@ static void addExitUsersForFirstOrderRecurrences(
 VPlanPtr
 LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
 
+  using namespace llvm::VPlanPatternMatch;
   SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;
 
   // ---------------------------------------------------------------------------
@@ -9229,6 +9230,10 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
                                             PSE, RequiresScalarEpilogueCheck,
                                             CM.foldTailByMasking(), OrigLoop);
 
+  // Build hierarchical CFG.
+  VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
+  HCFGBuilder.buildHierarchicalCFG();
+
   // Don't use getDecisionAndClampRange here, because we don't know the UF
   // so this function is better to be conservative, rather than to split
   // it up into different VPlans.
@@ -9297,23 +9302,45 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   RecipeBuilder.collectScaledReductions(Range);
 
   auto *MiddleVPBB = Plan->getMiddleBlock();
+  ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
+      Plan->getVectorLoopRegion()->getEntry());
+
   VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
-  for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
-    // Relevant instructions from basic block BB will be grouped into VPRecipe
-    // ingredients and fill a new VPBasicBlock.
-    if (VPBB != HeaderVPBB)
-      VPBB->setName(BB->getName());
-    Builder.setInsertPoint(VPBB);
+  VPBlockBase *PrevVPBB = nullptr;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
+    // Skip VPBBs not corresponding to any input IR basic blocks.
+    if (!HCFGBuilder.getIRBBForVPB(VPBB))
+      continue;
 
-    if (VPBB == HeaderVPBB)
+    // Create mask based on the IR BB corresponding to VPBB.
+    // TODO: Predicate directly based on VPlan.
+    if (VPBB == HeaderVPBB) {
+      Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
       RecipeBuilder.createHeaderMask();
-    else if (NeedsMasks)
-      RecipeBuilder.createBlockInMask(BB);
+    } else if (NeedsMasks) {
+      Builder.setInsertPoint(VPBB, VPBB->begin());
+      RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
+    }
 
-    // Introduce each ingredient into VPlan.
+    // Convert input VPInstructions to widened recipes.
     // TODO: Model and preserve debug intrinsics in VPlan.
-    for (Instruction &I : drop_end(BB->instructionsWithoutDebug(false))) {
-      Instruction *Instr = &I;
+    for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
+      auto *SingleDef = dyn_cast<VPSingleDefRecipe>(&R);
+      if (!isa<VPWidenPHIRecipe>(&R) &&
+          (!isa<VPInstruction>(SingleDef) || !SingleDef->getUnderlyingValue()))
+        continue;
+
+      if (match(&R, m_BranchOnCond(m_VPValue())) ||
+          (isa<VPInstruction>(&R) &&
+           cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {
+        R.eraseFromParent();
+        break;
+      }
+
+      // TODO: Gradually replace uses of underlying instruction by analyses on
+      // VPlan.
+      Instruction *Instr = SingleDef->getUnderlyingInstr();
+      Builder.setInsertPoint(SingleDef);
       SmallVector<VPValue *, 4> Operands;
       auto *Phi = dyn_cast<PHINode>(Instr);
       if (Phi && Phi->getParent() == HeaderBB) {
@@ -9328,15 +9355,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       // in the exit block, a uniform store recipe will be created for the final
       // invariant store of the reduction.
       StoreInst *SI;
-      if ((SI = dyn_cast<StoreInst>(&I)) &&
+      if ((SI = dyn_cast<StoreInst>(Instr)) &&
           Legal->isInvariantAddressOfReduction(SI->getPointerOperand())) {
         // Only create recipe for the final invariant store of the reduction.
-        if (!Legal->isInvariantStoreOfReduction(SI))
+        if (!Legal->isInvariantStoreOfReduction(SI)) {
+          R.eraseFromParent();
           continue;
+        }
         auto *Recipe = new VPReplicateRecipe(
             SI, RecipeBuilder.mapToVPValues(Instr->operands()),
             true /* IsUniform */);
         Recipe->insertBefore(*MiddleVPBB, MBIP);
+        R.eraseFromParent();
         continue;
       }
 
@@ -9355,16 +9385,30 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
         // after them)
         // * Optimizing truncates to VPWidenIntOrFpInductionRecipe.
 
-        assert((HeaderVPBB->getFirstNonPhi() == VPBB->end() ||
-                CM.foldTailByMasking() || isa<TruncInst>(Instr)) &&
-               "unexpected recipe needs moving");
         Recipe->insertBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());
       } else
-        VPBB->appendRecipe(Recipe);
+        Recipe->insertBefore(&R);
+      if (Recipe->getNumDefinedValues() == 1)
+        SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
+      else
+        assert(Recipe->getNumDefinedValues() == 0);
+      R.eraseFromParent();
     }
 
-    VPBlockUtils::insertBlockAfter(Plan->createVPBasicBlock(""), VPBB);
-    VPBB = cast<VPBasicBlock>(VPBB->getSingleSuccessor());
+    // Flatten the CFG in the loop. Masks for blocks have already been generated
+    // and added to recipes as needed. To do so, first disconnect VPBB from its
+    // predecessors and successors, except the exiting block. Then connect VPBB
+    // to the previously visited VPBB.
+    for (auto *Succ : to_vector(VPBB->getSuccessors())) {
+      if (Succ == Plan->getVectorLoopRegion()->getExiting())
+        continue;
+      VPBlockUtils::disconnectBlocks(VPBB, Succ);
+    }
+    for (auto *Pred : to_vector(VPBB->getPredecessors()))
+      VPBlockUtils::disconnectBlocks(Pred, VPBB);
+    if (PrevVPBB)
+      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
+    PrevVPBB = VPBB;
   }
 
   // After here, VPBB should not be used.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 83c54a9b9c259c..67e3371923a9f2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -579,9 +579,11 @@ static bool hasConditionalTerminator(const VPBasicBlock *VPBB) {
   }
 
   const VPRecipeBase *R = &VPBB->back();
-  bool IsCondBranch = isa<VPBranchOnMaskRecipe>(R) ||
-                      match(R, m_BranchOnCond(m_VPValue())) ||
-                      match(R, m_BranchOnCount(m_VPValue(), m_VPValue()));
+  bool IsCondBranch =
+      isa<VPBranchOnMaskRecipe>(R) || match(R, m_BranchOnCond(m_VPValue())) ||
+      match(R, m_BranchOnCount(m_VPValue(), m_VPValue())) ||
+      (isa<VPInstruction>(R) &&
+       cast<VPInstruction>(R)->getOpcode() == Instruction::Switch);
   (void)IsCondBranch;
 
   if (VPBB->getNumSuccessors() >= 2 ||
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
index 5a2e5d7cfee48d..399511ed441b4e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
@@ -75,7 +75,7 @@ class PlainCFGBuilder {
       : TheLoop(Lp), LI(LI), Plan(P) {}
 
   /// Build plain CFG for TheLoop  and connects it to Plan's entry.
-  void buildPlainCFG();
+  void buildPlainCFG(DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB);
 };
 } // anonymous namespace
 
@@ -238,9 +238,9 @@ bool PlainCFGBuilder::isExternalDef(Value *Val) {
     return false;
 
   // Check whether Instruction definition is in the loop exit.
-  BasicBlock *Exit = TheLoop->getUniqueExitBlock();
-  assert(Exit && "Expected loop with single exit.");
-  if (InstParent == Exit) {
+  SmallVector<BasicBlock *> ExitBlocks;
+  TheLoop->getExitBlocks(ExitBlocks);
+  if (is_contained(ExitBlocks, InstParent)) {
     // Instruction definition is in outermost loop exit.
     return false;
   }
@@ -308,6 +308,14 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
       continue;
     }
 
+    if (auto *SI = dyn_cast<SwitchInst>(Inst)) {
+      SmallVector<VPValue *> Ops = {getOrCreateVPOperand(SI->getCondition())};
+      for (auto Case : SI->cases())
+        Ops.push_back(getOrCreateVPOperand(Case.getCaseValue()));
+      VPIRBuilder.createNaryOp(Instruction::Switch, Ops, Inst);
+      continue;
+    }
+
     VPValue *NewVPV;
     if (auto *Phi = dyn_cast<PHINode>(Inst)) {
       // Phi node's operands may have not been visited at this point. We create
@@ -334,7 +342,8 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
 }
 
 // Main interface to build the plain CFG.
-void PlainCFGBuilder::buildPlainCFG() {
+void PlainCFGBuilder::buildPlainCFG(
+    DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
   // 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
   // skeleton. These were created directly rather than via getOrCreateVPBB(),
   // revisit them now to update BB2VPBB. Note that header/entry and
@@ -423,6 +432,14 @@ void PlainCFGBuilder::buildPlainCFG() {
     // Set VPBB successors. We create empty VPBBs for successors if they don't
     // exist already. Recipes will be created when the successor is visited
     // during the RPO traversal.
+    if (auto *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {
+      SmallVector<VPBlockBase *> Succs = {
+          getOrCreateVPBB(SI->getDefaultDest())};
+      for (auto Case : SI->cases())
+        Succs.push_back(getOrCreateVPBB(Case.getCaseSuccessor()));
+      VPBB->setSuccessors(Succs);
+      continue;
+    }
     auto *BI = cast<BranchInst>(BB->getTerminator());
     unsigned NumSuccs = succ_size(BB);
     if (NumSuccs == 1) {
@@ -476,11 +493,14 @@ void PlainCFGBuilder::buildPlainCFG() {
   // have a VPlan couterpart. Fix VPlan phi nodes by adding their corresponding
   // VPlan operands.
   fixPhiNodes();
+
+  for (const auto &[IRBB, VPB] : BB2VPBB)
+    VPB2IRBB[VPB] = IRBB;
 }
 
 void VPlanHCFGBuilder::buildPlainCFG() {
   PlainCFGBuilder PCFGBuilder(TheLoop, LI, Plan);
-  PCFGBuilder.buildPlainCFG();
+  PCFGBuilder.buildPlainCFG(VPB2IRBB);
 }
 
 // Public interface to build a H-CFG.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
index ad6e2ad90a9610..eac842c00d46a2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
@@ -53,6 +53,10 @@ class VPlanHCFGBuilder {
   // are introduced.
   VPDominatorTree VPDomTree;
 
+  /// Map of create VP blocks to their input IR basic blocks, if they have been
+  /// created for a input IR basic block.
+  DenseMap<VPBlockBase *, BasicBlock *> VPB2IRBB;
+
   /// Build plain CFG for TheLoop and connects it to Plan's entry.
   void buildPlainCFG();
 
@@ -62,6 +66,12 @@ class VPlanHCFGBuilder {
 
   /// Build H-CFG for TheLoop and update Plan accordingly.
   void buildHierarchicalCFG();
+
+  /// Return the input IR BasicBlock corresponding to \p VPB. Returns nullptr if
+  /// there is no such corresponding block.
+  BasicBlock *getIRBBForVPB(const VPBlockBase *VPB) const {
+    return VPB2IRBB.lookup(VPB);
+  }
 };
 } // namespace llvm
 
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index f630f4f21e065f..8a32c71ed6ce1c 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -46,7 +46,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
 ; CHECK-NEXT:  LV: Using user VF vscale x 4.
 ; CHECK-NEXT:  LV: Loop does not require scalar epilogue
-; CHECK-NEXT:  LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
+; CHECK:       LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
 ; CHECK-NEXT:  LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx3 = getelementptr inbounds i32, ptr %A, i64 %idxprom
@@ -295,7 +295,7 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
 ; CHECK-NEXT:  LV: Using user VF vscale x 4.
 ; CHECK-NEXT:  LV: Loop does not require scalar epilogue
-; CHECK-NEXT:  LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
+; CHECK:       LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
 ; CHECK-NEXT:  LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx = getelementptr inbounds float, ptr %B, i64 %idxprom
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx3 = getelementptr inbounds float, ptr %A, i64 %idxprom

@llvmbot
Copy link
Member

llvmbot commented Jan 25, 2025

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

Use HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes.

At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions.

This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode.

There are multiple follow-ups planned:

  • Use VPIRInstructions instead of VPInstructions in HCFGBuilder,
  • Perform predication on the VPlan directly,
  • Unify code constructing VPlan 0 to be shared by both inner and outer loop code paths.
  • Construct VPlan 0 once, clone subsequent ones for VFs

Full diff: https://github.com/llvm/llvm-project/pull/124432.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+65-21)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+5-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp (+26-6)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h (+10)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3a4f637f177e19..5d0458f473a78f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8299,7 +8299,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                                 : GEPNoWrapFlags::none(),
                                             I->getDebugLoc());
     }
-    Builder.getInsertBlock()->appendRecipe(VectorPtr);
+    VectorPtr->insertBefore(&*Builder.getInsertPoint());
     Ptr = VectorPtr;
   }
   if (LoadInst *Load = dyn_cast<LoadInst>(I))
@@ -9206,6 +9206,7 @@ static void addExitUsersForFirstOrderRecurrences(
 VPlanPtr
 LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
 
+  using namespace llvm::VPlanPatternMatch;
   SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;
 
   // ---------------------------------------------------------------------------
@@ -9229,6 +9230,10 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
                                             PSE, RequiresScalarEpilogueCheck,
                                             CM.foldTailByMasking(), OrigLoop);
 
+  // Build hierarchical CFG.
+  VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
+  HCFGBuilder.buildHierarchicalCFG();
+
   // Don't use getDecisionAndClampRange here, because we don't know the UF
   // so this function is better to be conservative, rather than to split
   // it up into different VPlans.
@@ -9297,23 +9302,45 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   RecipeBuilder.collectScaledReductions(Range);
 
   auto *MiddleVPBB = Plan->getMiddleBlock();
+  ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
+      Plan->getVectorLoopRegion()->getEntry());
+
   VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
-  for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
-    // Relevant instructions from basic block BB will be grouped into VPRecipe
-    // ingredients and fill a new VPBasicBlock.
-    if (VPBB != HeaderVPBB)
-      VPBB->setName(BB->getName());
-    Builder.setInsertPoint(VPBB);
+  VPBlockBase *PrevVPBB = nullptr;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
+    // Skip VPBBs not corresponding to any input IR basic blocks.
+    if (!HCFGBuilder.getIRBBForVPB(VPBB))
+      continue;
 
-    if (VPBB == HeaderVPBB)
+    // Create mask based on the IR BB corresponding to VPBB.
+    // TODO: Predicate directly based on VPlan.
+    if (VPBB == HeaderVPBB) {
+      Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
       RecipeBuilder.createHeaderMask();
-    else if (NeedsMasks)
-      RecipeBuilder.createBlockInMask(BB);
+    } else if (NeedsMasks) {
+      Builder.setInsertPoint(VPBB, VPBB->begin());
+      RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
+    }
 
-    // Introduce each ingredient into VPlan.
+    // Convert input VPInstructions to widened recipes.
     // TODO: Model and preserve debug intrinsics in VPlan.
-    for (Instruction &I : drop_end(BB->instructionsWithoutDebug(false))) {
-      Instruction *Instr = &I;
+    for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
+      auto *SingleDef = dyn_cast<VPSingleDefRecipe>(&R);
+      if (!isa<VPWidenPHIRecipe>(&R) &&
+          (!isa<VPInstruction>(SingleDef) || !SingleDef->getUnderlyingValue()))
+        continue;
+
+      if (match(&R, m_BranchOnCond(m_VPValue())) ||
+          (isa<VPInstruction>(&R) &&
+           cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {
+        R.eraseFromParent();
+        break;
+      }
+
+      // TODO: Gradually replace uses of underlying instruction by analyses on
+      // VPlan.
+      Instruction *Instr = SingleDef->getUnderlyingInstr();
+      Builder.setInsertPoint(SingleDef);
       SmallVector<VPValue *, 4> Operands;
       auto *Phi = dyn_cast<PHINode>(Instr);
       if (Phi && Phi->getParent() == HeaderBB) {
@@ -9328,15 +9355,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       // in the exit block, a uniform store recipe will be created for the final
       // invariant store of the reduction.
       StoreInst *SI;
-      if ((SI = dyn_cast<StoreInst>(&I)) &&
+      if ((SI = dyn_cast<StoreInst>(Instr)) &&
           Legal->isInvariantAddressOfReduction(SI->getPointerOperand())) {
         // Only create recipe for the final invariant store of the reduction.
-        if (!Legal->isInvariantStoreOfReduction(SI))
+        if (!Legal->isInvariantStoreOfReduction(SI)) {
+          R.eraseFromParent();
           continue;
+        }
         auto *Recipe = new VPReplicateRecipe(
             SI, RecipeBuilder.mapToVPValues(Instr->operands()),
             true /* IsUniform */);
         Recipe->insertBefore(*MiddleVPBB, MBIP);
+        R.eraseFromParent();
         continue;
       }
 
@@ -9355,16 +9385,30 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
         // after them)
         // * Optimizing truncates to VPWidenIntOrFpInductionRecipe.
 
-        assert((HeaderVPBB->getFirstNonPhi() == VPBB->end() ||
-                CM.foldTailByMasking() || isa<TruncInst>(Instr)) &&
-               "unexpected recipe needs moving");
         Recipe->insertBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());
       } else
-        VPBB->appendRecipe(Recipe);
+        Recipe->insertBefore(&R);
+      if (Recipe->getNumDefinedValues() == 1)
+        SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
+      else
+        assert(Recipe->getNumDefinedValues() == 0);
+      R.eraseFromParent();
     }
 
-    VPBlockUtils::insertBlockAfter(Plan->createVPBasicBlock(""), VPBB);
-    VPBB = cast<VPBasicBlock>(VPBB->getSingleSuccessor());
+    // Flatten the CFG in the loop. Masks for blocks have already been generated
+    // and added to recipes as needed. To do so, first disconnect VPBB from its
+    // predecessors and successors, except the exiting block. Then connect VPBB
+    // to the previously visited VPBB.
+    for (auto *Succ : to_vector(VPBB->getSuccessors())) {
+      if (Succ == Plan->getVectorLoopRegion()->getExiting())
+        continue;
+      VPBlockUtils::disconnectBlocks(VPBB, Succ);
+    }
+    for (auto *Pred : to_vector(VPBB->getPredecessors()))
+      VPBlockUtils::disconnectBlocks(Pred, VPBB);
+    if (PrevVPBB)
+      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
+    PrevVPBB = VPBB;
   }
 
   // After here, VPBB should not be used.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 83c54a9b9c259c..67e3371923a9f2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -579,9 +579,11 @@ static bool hasConditionalTerminator(const VPBasicBlock *VPBB) {
   }
 
   const VPRecipeBase *R = &VPBB->back();
-  bool IsCondBranch = isa<VPBranchOnMaskRecipe>(R) ||
-                      match(R, m_BranchOnCond(m_VPValue())) ||
-                      match(R, m_BranchOnCount(m_VPValue(), m_VPValue()));
+  bool IsCondBranch =
+      isa<VPBranchOnMaskRecipe>(R) || match(R, m_BranchOnCond(m_VPValue())) ||
+      match(R, m_BranchOnCount(m_VPValue(), m_VPValue())) ||
+      (isa<VPInstruction>(R) &&
+       cast<VPInstruction>(R)->getOpcode() == Instruction::Switch);
   (void)IsCondBranch;
 
   if (VPBB->getNumSuccessors() >= 2 ||
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
index 5a2e5d7cfee48d..399511ed441b4e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
@@ -75,7 +75,7 @@ class PlainCFGBuilder {
       : TheLoop(Lp), LI(LI), Plan(P) {}
 
   /// Build plain CFG for TheLoop  and connects it to Plan's entry.
-  void buildPlainCFG();
+  void buildPlainCFG(DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB);
 };
 } // anonymous namespace
 
@@ -238,9 +238,9 @@ bool PlainCFGBuilder::isExternalDef(Value *Val) {
     return false;
 
   // Check whether Instruction definition is in the loop exit.
-  BasicBlock *Exit = TheLoop->getUniqueExitBlock();
-  assert(Exit && "Expected loop with single exit.");
-  if (InstParent == Exit) {
+  SmallVector<BasicBlock *> ExitBlocks;
+  TheLoop->getExitBlocks(ExitBlocks);
+  if (is_contained(ExitBlocks, InstParent)) {
     // Instruction definition is in outermost loop exit.
     return false;
   }
@@ -308,6 +308,14 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
       continue;
     }
 
+    if (auto *SI = dyn_cast<SwitchInst>(Inst)) {
+      SmallVector<VPValue *> Ops = {getOrCreateVPOperand(SI->getCondition())};
+      for (auto Case : SI->cases())
+        Ops.push_back(getOrCreateVPOperand(Case.getCaseValue()));
+      VPIRBuilder.createNaryOp(Instruction::Switch, Ops, Inst);
+      continue;
+    }
+
     VPValue *NewVPV;
     if (auto *Phi = dyn_cast<PHINode>(Inst)) {
       // Phi node's operands may have not been visited at this point. We create
@@ -334,7 +342,8 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
 }
 
 // Main interface to build the plain CFG.
-void PlainCFGBuilder::buildPlainCFG() {
+void PlainCFGBuilder::buildPlainCFG(
+    DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
   // 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
   // skeleton. These were created directly rather than via getOrCreateVPBB(),
   // revisit them now to update BB2VPBB. Note that header/entry and
@@ -423,6 +432,14 @@ void PlainCFGBuilder::buildPlainCFG() {
     // Set VPBB successors. We create empty VPBBs for successors if they don't
     // exist already. Recipes will be created when the successor is visited
     // during the RPO traversal.
+    if (auto *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {
+      SmallVector<VPBlockBase *> Succs = {
+          getOrCreateVPBB(SI->getDefaultDest())};
+      for (auto Case : SI->cases())
+        Succs.push_back(getOrCreateVPBB(Case.getCaseSuccessor()));
+      VPBB->setSuccessors(Succs);
+      continue;
+    }
     auto *BI = cast<BranchInst>(BB->getTerminator());
     unsigned NumSuccs = succ_size(BB);
     if (NumSuccs == 1) {
@@ -476,11 +493,14 @@ void PlainCFGBuilder::buildPlainCFG() {
   // have a VPlan couterpart. Fix VPlan phi nodes by adding their corresponding
   // VPlan operands.
   fixPhiNodes();
+
+  for (const auto &[IRBB, VPB] : BB2VPBB)
+    VPB2IRBB[VPB] = IRBB;
 }
 
 void VPlanHCFGBuilder::buildPlainCFG() {
   PlainCFGBuilder PCFGBuilder(TheLoop, LI, Plan);
-  PCFGBuilder.buildPlainCFG();
+  PCFGBuilder.buildPlainCFG(VPB2IRBB);
 }
 
 // Public interface to build a H-CFG.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
index ad6e2ad90a9610..eac842c00d46a2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
@@ -53,6 +53,10 @@ class VPlanHCFGBuilder {
   // are introduced.
   VPDominatorTree VPDomTree;
 
+  /// Map of create VP blocks to their input IR basic blocks, if they have been
+  /// created for a input IR basic block.
+  DenseMap<VPBlockBase *, BasicBlock *> VPB2IRBB;
+
   /// Build plain CFG for TheLoop and connects it to Plan's entry.
   void buildPlainCFG();
 
@@ -62,6 +66,12 @@ class VPlanHCFGBuilder {
 
   /// Build H-CFG for TheLoop and update Plan accordingly.
   void buildHierarchicalCFG();
+
+  /// Return the input IR BasicBlock corresponding to \p VPB. Returns nullptr if
+  /// there is no such corresponding block.
+  BasicBlock *getIRBBForVPB(const VPBlockBase *VPB) const {
+    return VPB2IRBB.lookup(VPB);
+  }
 };
 } // namespace llvm
 
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index f630f4f21e065f..8a32c71ed6ce1c 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -46,7 +46,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
 ; CHECK-NEXT:  LV: Using user VF vscale x 4.
 ; CHECK-NEXT:  LV: Loop does not require scalar epilogue
-; CHECK-NEXT:  LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
+; CHECK:       LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
 ; CHECK-NEXT:  LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx3 = getelementptr inbounds i32, ptr %A, i64 %idxprom
@@ -295,7 +295,7 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
 ; CHECK-NEXT:  LV: Using user VF vscale x 4.
 ; CHECK-NEXT:  LV: Loop does not require scalar epilogue
-; CHECK-NEXT:  LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
+; CHECK:       LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
 ; CHECK-NEXT:  LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx = getelementptr inbounds float, ptr %B, i64 %idxprom
 ; CHECK-NEXT:  LV: Scalarizing: %arrayidx3 = getelementptr inbounds float, ptr %A, i64 %idxprom

Use HCFGBuilder to build an initial VPlan 0, which wraps all input
instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes
to replace the VPInstructions with widened recipes.

At the moment, widened recipes are created based on the underlying
instruction of the VPInstruction. Masks are also still created based on
the input IR basic blocks and the loop CFG is flattened in the main
loop processing the VPInstructions.

This patch also incldues support for Switch instructions in HCFGBuilder
using just a VPInstruction with Instruction::Switch opcode.

There are multiple follow-ups planned:
 * Use VPIRInstructions instead of VPInstructions in HCFGBuilder,
 * Perform predication on the VPlan directly,
 * Unify code constructing VPlan 0 to be shared by both inner and outer
   loop code paths.
@fhahn
Copy link
Contributor Author

fhahn commented Feb 1, 2025

ping :)

@ayalz
Copy link
Collaborator

ayalz commented Feb 3, 2025

Use HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes.

At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions.

This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode.

There are multiple follow-ups planned:

  • Use VPIRInstructions instead of VPInstructions in HCFGBuilder,
  • Perform predication on the VPlan directly,
  • Unify code constructing VPlan 0 to be shared by both inner and outer loop code paths.
  • Construct VPlan 0 once, clone subsequent ones for VFs

This is a very nice step forward!
Trying to align with the roadmap sketched in https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf#page=37 and its "Direction 1: Refactor VPlan Creation", the VPlan built by HCFGBuilder seems to correspond to the 2nd VPlan of "buildLoop". Several aspects deserves fixing for this VPlan to represent a scalar version of the to-be-vectorized loop (i.e., for UF=VF=1) more consistently:

  • Phi's, as all other recipes of this VPlan, are scalar. However, this VPlan uses VPWidenPhiRecipes which represent vectorized and/or unrolled phi's.
  • VPInstructions should rely on their opcode, operands and flags only. However, this VPlan uses VPInstructions which rely on their underlying Instructions, disregarding their flags.
    Note that, in contrast, VPIRInstructions, which are associated with the 1st VPlan of "wrapInput", represent the immutable Instructions in-place.

@@ -8299,7 +8299,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
: GEPNoWrapFlags::none(),
I->getDebugLoc());
}
Builder.getInsertBlock()->appendRecipe(VectorPtr);
VectorPtr->insertBefore(&*Builder.getInsertPoint());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems awkward, would be better to have VPBuilder support

Suggested change
VectorPtr->insertBefore(&*Builder.getInsertPoint());
Builder.insert(VectorPtr);

as inspired by IRBuilderBase?

@@ -8299,7 +8299,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
: GEPNoWrapFlags::none(),
I->getDebugLoc());
}
Builder.getInsertBlock()->appendRecipe(VectorPtr);
VectorPtr->insertBefore(&*Builder.getInsertPoint());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert at insert point rather than at the end of the insert block - is this change in/dependent of rest of patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split off to f8fa931, thanks

@@ -46,7 +46,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why/Is this line no longer NEXT? Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now also include printing the VPlan after HCFG construction, should add a test to check that or drop the extra output?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, whatever you prefer.

Comment on lines +9233 to +9250
// Build hierarchical CFG.
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps VPlanHCFGBuilder::buildingHierarchicalCFG() which creates initial VPBB's inside loop region ( via PlainCFGBuilder::buildPlainCFG()) should now be combined with VPlan::createInitialVPlan() which creates initial VPBB's elsewhere?
Can be done as follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, maybe best to merge as follow-up

VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove

  LoopBlocksDFS DFS(OrigLoop);
  DFS.perform(LI);

above which become dead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

@@ -238,9 +238,9 @@ bool PlainCFGBuilder::isExternalDef(Value *Val) {
return false;

// Check whether Instruction definition is in the loop exit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check whether Instruction definition is in the loop exit.
// Check whether Instruction definition is in a loop exit.

?

Is this notion of "External Def" still valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I think so as

Comment on lines +311 to +318
if (auto *SI = dyn_cast<SwitchInst>(Inst)) {
SmallVector<VPValue *> Ops = {getOrCreateVPOperand(SI->getCondition())};
for (auto Case : SI->cases())
Ops.push_back(getOrCreateVPOperand(Case.getCaseValue()));
VPIRBuilder.createNaryOp(Instruction::Switch, Ops, Inst);
continue;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teaching PlainCFGBuilder about switch statements - does this imply that "native" can now vectorize outerloops with switches, and worth testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The native path still won't vectorize switches, due to checks in legality I think. I can add a test case, thanks

Comment on lines 9349 to 9350
(isa<VPInstruction>(&R) &&
cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can match via some m_Switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately at the moment the matchers require a fixed number of operands; need to think about how to match without a fixed number of ops.

if (Recipe->getNumDefinedValues() == 1)
SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
else
assert(Recipe->getNumDefinedValues() == 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(Recipe->getNumDefinedValues() == 0);
assert(Recipe->getNumDefinedValues() == 0 && "Unexpected multidef recipe");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added thanks

Comment on lines +241 to 246
SmallVector<BasicBlock *> ExitBlocks;
TheLoop->getExitBlocks(ExitBlocks);
if (is_contained(ExitBlocks, InstParent)) {
// Instruction definition is in outermost loop exit.
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer expecting a loop with single exit?
Teaching PlainCFGBuilder about multiple exits - does this imply that "native" can now vectorize such outerloops, and worth testing?
OTOH, PlainCFGBuilder::buildPlainCFG() does handle multiple/non-unique exits(?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed for the inner loop multi-exit support. The native path still won't vectorize early exits, due to checks in legality I think. I can add a test case, thanks

fhahn added a commit that referenced this pull request Feb 3, 2025
Replacing a recipe with a live-in may not be correct in all cases,
e.g. when replacing recipes involving header-phi recipes, like
reductions.

For now, only use SCEV to simplify live-ins.

More powerful input simplification can be built in top of
#124432 in the future.


Fixes #119173.
Fixes #125374.

PR: #125436
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 3, 2025
Replacing a recipe with a live-in may not be correct in all cases,
e.g. when replacing recipes involving header-phi recipes, like
reductions.

For now, only use SCEV to simplify live-ins.

More powerful input simplification can be built in top of
llvm/llvm-project#124432 in the future.

Fixes llvm/llvm-project#119173.
Fixes llvm/llvm-project#125374.

PR: llvm/llvm-project#125436
fhahn added a commit that referenced this pull request Feb 3, 2025
Split off from #124432 as
suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 3, 2025
…ointer (NFC).

Split off from llvm/llvm-project#124432 as
suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.
Builder.setInsertPoint(VPBB);
VPBlockBase *PrevVPBB = nullptr;
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
// Skip VPBBs not corresponding to any input IR basic blocks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Skip VPBBs not corresponding to any input IR basic blocks.
// Handle VPBBs down to the latch.

Latch also conceptually has a corresponding IRBB, with underlying Instructions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks. Initially the latch in the IR is not the latch/exiting block in the VPlan; the exiting block is created as part of the skeleton.


// Convert input VPInstructions to widened recipes.
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
auto *SingleDef = cast<VPSingleDefRecipe>(&R);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *SingleDef = cast<VPSingleDefRecipe>(&R);
auto *SingleDef = cast<VPSingleDefRecipe>(&R);
auto *UnderlyingValue = SingleDef->getUnderlyingValue();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

auto *SingleDef = cast<VPSingleDefRecipe>(&R);
// Skip recipes that do not need transforming, including canonical IV,
// wide canonical IV and VPInstructions without underlying values. The
// latter is added by masking.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// latter is added by masking.
// latter are added above for masking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Comment on lines 9366 to 9368
if (isa<VPCanonicalIVPHIRecipe>(SingleDef) ||
isa<VPWidenCanonicalIVRecipe>(SingleDef) ||
(isa<VPInstruction>(&R) && !SingleDef->getUnderlyingValue()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consistency

Suggested change
if (isa<VPCanonicalIVPHIRecipe>(SingleDef) ||
isa<VPWidenCanonicalIVRecipe>(SingleDef) ||
(isa<VPInstruction>(&R) && !SingleDef->getUnderlyingValue()))
if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(&R) ||
(isa<VPInstruction>(&R) && !UnderlyingValue))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ,thanks

Comment on lines 9370 to 9371
assert(isa<VPWidenPHIRecipe>(&R) || (isa<VPInstruction>(SingleDef) &&
SingleDef->getUnderlyingValue()) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consistency

Suggested change
assert(isa<VPWidenPHIRecipe>(&R) || (isa<VPInstruction>(SingleDef) &&
SingleDef->getUnderlyingValue()) &&
assert(isa<VPWidenPHIRecipe, VPInstruction>(&R) && UnderlyingValue &&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why but VPWidenPHIRecipe, VPInstruction cannot be used in a single isa<> here. Will need to look deeper what's going on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra parens around the isa, since assert is a macro?

(void)IsCondBranch;
(void)IsSwitch;

if (VPBB->getNumSuccessors() >= 2 ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can check >2 case separately expecting IsSwitch only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks


VPBlockUtils::insertBlockAfter(Plan->createVPBasicBlock(""), VPBB);
VPBB = cast<VPBasicBlock>(VPBB->getSingleSuccessor());
Builder.insert(Recipe);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
Builder.insert(Recipe);
Builder.insert(Recipe);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

CM.foldTailByMasking() || isa<TruncInst>(Instr)) &&
"unexpected recipe needs moving");
if (isa<VPWidenIntOrFpInductionRecipe>(Recipe) && isa<TruncInst>(Instr)) {
// Optimized a truncate truncates to VPWidenIntOrFpInductionRecipe Move
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
auto *SingleDef = cast<VPSingleDefRecipe>(&R);
// Skip recipes that do not need transforming, including canonical IV,
// wide canonical IV and VPInstructions without underlying values. The
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a FIXME to stop relying on the underlying values of VPInstructions.
The "buildLoop" VPlan0 should use VPInstructions with flags and/or other recipes to best model a copy of the original scalar loop, in contrast to VPIRInstructions which model the original scalar loop itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

isa<VPWidenCanonicalIVRecipe>(SingleDef) ||
(isa<VPInstruction>(&R) && !SingleDef->getUnderlyingValue()))
continue;
assert(isa<VPWidenPHIRecipe>(&R) || (isa<VPInstruction>(SingleDef) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a FIXME to stop using VPWidenPHIRecipe in VPlan0, which models (a copy of) the original scalar loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 7, 2025
Replacing a recipe with a live-in may not be correct in all cases,
e.g. when replacing recipes involving header-phi recipes, like
reductions.

For now, only use SCEV to simplify live-ins.

More powerful input simplification can be built in top of
llvm/llvm-project#124432 in the future.

Fixes llvm/llvm-project#119173.
Fixes llvm/llvm-project#125374.

PR: llvm/llvm-project#125436
(cherry picked from commit 30f3752)
Copy link

github-actions bot commented Feb 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good step forward along roadmap, adding several comments and suggestions.

if (VPBB == HeaderVPBB)
// Create mask based on the IR BB corresponding to VPBB.
// TODO: Predicate directly based on VPlan.
Builder.setInsertPoint(VPBB, VPBB->begin());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So non-phi recipes that compute a block's mask must be introduced before the phi recipes of the block, because the latter is eventually replaced by blends that use the former? This calls for a FIXME note.
One alternative may be to sink the blends when they replace the phis.
Another is to introduce masking recipes along with replacing phi recipes with blends, as done currently in tryToBuildVPlanWithRecipes().

RecipeBuilder.createBlockInMask(BB);
} else if (NeedsMasks) {
Builder.setInsertPoint(VPBB, VPBB->begin());
RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's go with HCFGBuilder.getIRBBForVPB(), for now ...
Following our roadmap, HCFGBuilder would be the VPlan-to-VPlan transformation producing "buildLoop" from "wrapInput", and tryToBuildVPlanWithRecipes would follow as a subsequent VPlan-to-VPlan transformation(s). Obtaining this VPBB-to-IRBB mapping from HCFGBuilder should be removed in one of two ways (or more):

  1. If the buildLoop VPlan is to maintain its correspondence with IR basic blocks then a new type of VPBB can be introduced - one providing getIRBasicBlock() similar to VPIRBasicBlock but having an execute() similar to VPBasicBlock - that generates new basic blocks, or deemed abstract/unreachable - to be materialized into a concreate VPBB prior to codegen.
  2. If, OTOH, getIRBBForVPB() is meant only as a temporary solution to support createBlockInMask() until the latter is upgraded to work directly on VPBB's, then a FIXME should be added.

(Recipes are created for internal conditional branches but not for unconditional branches (including early exits) - which imply a single successor. An empty block also implies a single predecessor due to lack of phi recipes. Figuring out the BasicBlock from the single predecessor and/or successor of an empty VPBB seems cumbersome; getIRBBForVPB() could record the mapping for empty VPBB's only; simplest to probably to record all VPBB's in loop, for now.)

RecipeBuilder.createHeaderMask();
else if (NeedsMasks)
RecipeBuilder.createBlockInMask(BB);
} else if (NeedsMasks) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the header mask also be created only if NeedsMasks? (Seems independent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently createHeaderMask itself checks if tail folding is enabled, and only then creates the mask

auto *UnderlyingValue = SingleDef->getUnderlyingValue();
// Skip recipes that do not need transforming, including canonical IV,
// wide canonical IV and VPInstructions without underlying values. The
// latter are added above by masking.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// latter are added above by masking.
// latter are added above for masking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

Comment on lines 9367 to 9368
if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(SingleDef) ||
(isa<VPInstruction>(&R) && !UnderlyingValue))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important for some isa's to use SingleDef and others &R?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, updated to use &R for consistency and more compact, thanks

Comment on lines 9377 to 9379
if (match(&R, m_BranchOnCond(m_VPValue())) ||
(isa<VPInstruction>(&R) &&
cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then perhaps more consistent / simpler to do:

Suggested change
if (match(&R, m_BranchOnCond(m_VPValue())) ||
(isa<VPInstruction>(&R) &&
cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {
if (isa<VPInstruction>(&R) &&
(cast<VPInstruction>(&R)->getOpcode() == VPInstruction::BranchOnCond ||
(cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {

(once VPWidenPHIRecipe is turned into a VPInstruction this would be simpler)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks


// Flatten the CFG in the loop. Masks for blocks have already been generated
// and added to recipes as needed. To do so, first disconnect VPBB from its
// successors. Then connect VPBB to the previously visited$ VPBB.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// successors. Then connect VPBB to the previously visited$ VPBB.
// successors. Then connect VPBB to the previously visited VPBB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, thanks

@@ -46,7 +46,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, whatever you prefer.

Comment on lines +9342 to +9343
assert(!HCFGBuilder.getIRBBForVPB(VPBB) &&
"the latch block shouldn't have a corresponding IRBB");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... We always have a latch VPBB, which isn't associated with an IRBB, which is what this covered

Initially the latch in the IR is not the latch/exiting block in the VPlan; the exiting block is created as part of the skeleton.

Thanks for clarifying. Should this discrepancy be fixed, perhaps as part of integrating HCFGBuilder with skeleton creation, to form "buildLoop" VPlan whose loop VPBB's are in full 1-1 correspondence with the original IRBB's of the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, currently they are quite intertwined already, but once we use it in the inner loop path I'll continue to further refactor/improve on the current structure

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Replacing a recipe with a live-in may not be correct in all cases,
e.g. when replacing recipes involving header-phi recipes, like
reductions.

For now, only use SCEV to simplify live-ins.

More powerful input simplification can be built in top of
llvm#124432 in the future.


Fixes llvm#119173.
Fixes llvm#125374.

PR: llvm#125436
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Split off from llvm#124432 as
suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.
clayne pushed a commit to clayne/llvm-project that referenced this pull request Feb 12, 2025
Replacing a recipe with a live-in may not be correct in all cases,
e.g. when replacing recipes involving header-phi recipes, like
reductions.

For now, only use SCEV to simplify live-ins.

More powerful input simplification can be built in top of
llvm#124432 in the future.

Fixes llvm#119173.
Fixes llvm#125374.

PR: llvm#125436
(cherry picked from commit 30f3752)
@fhahn fhahn merged commit 38376de into llvm:main Feb 18, 2025
8 checks passed
@fhahn fhahn deleted the vplan0 branch February 18, 2025 15:12
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 18, 2025

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/12949

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/6/11 (2082 of 2091)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/7/11 (2083 of 2091)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/8/11 (2084 of 2091)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/9/11 (2085 of 2091)
PASS: lldb-unit :: tools/lldb-server/tests/./LLDBServerTests/0/2 (2086 of 2091)
PASS: lldb-unit :: tools/lldb-server/tests/./LLDBServerTests/1/2 (2087 of 2091)
PASS: lldb-unit :: Target/./TargetTests/11/14 (2088 of 2091)
PASS: lldb-unit :: Host/./HostTests/2/13 (2089 of 2091)
PASS: lldb-unit :: Process/gdb-remote/./ProcessGdbRemoteTests/8/9 (2090 of 2091)
UNRESOLVED: lldb-api :: tools/lldb-server/TestLldbGdbServer.py (2091 of 2091)
******************** TEST 'lldb-api :: tools/lldb-server/TestLldbGdbServer.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/tools/lldb-server -p TestLldbGdbServer.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 38376dee92224c6657ef6d88413bfc77f4441268)
  clang revision 38376dee92224c6657ef6d88413bfc77f4441268
  llvm revision 38376dee92224c6657ef6d88413bfc77f4441268
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hc_then_Csignal_signals_correct_thread_launch_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hc_then_Csignal_signals_correct_thread_launch_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hg_fails_on_another_pid_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hg_fails_on_minus_one_pid_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hg_fails_on_zero_pid_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hg_switches_to_3_threads_launch_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hg_switches_to_3_threads_launch_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_P_and_p_thread_suffix_work_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_P_and_p_thread_suffix_work_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_P_writes_all_gpr_registers_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_P_writes_all_gpr_registers_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_attach_commandline_continue_app_exits_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
Program aborted due to an unhandled Error:
Operation not permitted
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb-server gdbserver --attach=2972578 --reverse-connect [127.0.0.1]:48229
 #0 0x0000aaaae59e3560 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb-server+0xb33560)
 #1 0x0000aaaae59e158c llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb-server+0xb3158c)
 #2 0x0000aaaae59e3c70 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x0000ffff9aea57dc (linux-vdso.so.1+0x7dc)
 #4 0x0000ffff9a6af200 __pthread_kill_implementation ./nptl/pthread_kill.c:44:76

github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 18, 2025
…loops. (NFC) (#124432)

Use HCFGBuilder to build an initial VPlan 0, which wraps all input
instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes
to replace the VPInstructions with widened recipes.

At the moment, widened recipes are created based on the underlying
instruction of the VPInstruction. Masks are also still created based on
the input IR basic blocks and the loop CFG is flattened in the main loop
processing the VPInstructions.

This patch also incldues support for Switch instructions in HCFGBuilder
using just a VPInstruction with Instruction::Switch opcode.

There are multiple follow-ups planned:
 * Perform predication on the VPlan directly,
* Unify code constructing VPlan 0 to be shared by both inner and outer
loop code paths.
 * Construct VPlan 0 once, clone subsequent ones for VFs

PR: llvm/llvm-project#124432
wldfngrs pushed a commit to wldfngrs/llvm-project that referenced this pull request Feb 19, 2025
llvm#124432)

Use HCFGBuilder to build an initial VPlan 0, which wraps all input
instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes
to replace the VPInstructions with widened recipes.

At the moment, widened recipes are created based on the underlying
instruction of the VPInstruction. Masks are also still created based on
the input IR basic blocks and the loop CFG is flattened in the main loop
processing the VPInstructions.

This patch also incldues support for Switch instructions in HCFGBuilder
using just a VPInstruction with Instruction::Switch opcode.

There are multiple follow-ups planned:
 * Perform predication on the VPlan directly,
* Unify code constructing VPlan 0 to be shared by both inner and outer
loop code paths.
 * Construct VPlan 0 once, clone subsequent ones for VFs

PR: llvm#124432
@Zentrik
Copy link
Contributor

Zentrik commented Feb 19, 2025

I'm getting an assertion failure after this commit when using llvm on Linux and Windows. I can create a reproducer if that's useful.

Assertion failed: i < getNumSuccessors() && "Successor # out of range for Branch!", file /workspace/srcdir/llvm-project/llvm/include/llvm/IR/Instructions.h, line 3117

[5772] signal 22: SIGABRT
in expression starting at none:0
crt_sig_handler at D:/a/llvm_julia_tester/llvm_julia_tester/julia/src\signals-win.c:104
raise at C:\Windows\System32\msvcrt.dll (unknown line)
abort at C:\Windows\System32\msvcrt.dll (unknown line)
assert at C:\Windows\System32\msvcrt.dll (unknown line)
_ZN12_GLOBAL__N_115PlainCFGBuilder13buildPlainCFGERN4llvm8DenseMapIPNS1_11VPBlockBaseEPNS1_10BasicBlockENS1_12DenseMapInfoIS4_vEENS1_6detail12DenseMapPairIS4_S6_EEEE at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm16VPlanHCFGBuilder13buildPlainCFGEv at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm16VPlanHCFGBuilder20buildHierarchicalCFGEv at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm24LoopVectorizationPlanner24buildVPlansWithVPRecipesENS_12ElementCountES1_ at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm24LoopVectorizationPlanner4planENS_12ElementCountEj at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZNK4llvm8LoopBaseINS_10BasicBlockENS_4LoopEE14getLoopLatchesERNS_15SmallVectorImplIPS1_EE at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm28getOptionalBoolLoopAttributeEPKNS_4LoopENS_9StringRefE at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm17LoopVectorizePass7runImplERNS_8FunctionE at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm17LoopVectorizePass3runERNS_8FunctionERNS_15AnalysisManagerIS1_JEEE at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)

@mikaelholmen
Copy link
Collaborator

I'm getting an assertion failure after this commit when using llvm on Linux and Windows. I can create a reproducer if that's useful.

Assertion failed: i < getNumSuccessors() && "Successor # out of range for Branch!", file /workspace/srcdir/llvm-project/llvm/include/llvm/IR/Instructions.h, line 3117

[5772] signal 22: SIGABRT
in expression starting at none:0
crt_sig_handler at D:/a/llvm_julia_tester/llvm_julia_tester/julia/src\signals-win.c:104
raise at C:\Windows\System32\msvcrt.dll (unknown line)
abort at C:\Windows\System32\msvcrt.dll (unknown line)
assert at C:\Windows\System32\msvcrt.dll (unknown line)
_ZN12_GLOBAL__N_115PlainCFGBuilder13buildPlainCFGERN4llvm8DenseMapIPNS1_11VPBlockBaseEPNS1_10BasicBlockENS1_12DenseMapInfoIS4_vEENS1_6detail12DenseMapPairIS4_S6_EEEE at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm16VPlanHCFGBuilder13buildPlainCFGEv at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm16VPlanHCFGBuilder20buildHierarchicalCFGEv at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm24LoopVectorizationPlanner24buildVPlansWithVPRecipesENS_12ElementCountES1_ at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm24LoopVectorizationPlanner4planENS_12ElementCountEj at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZNK4llvm8LoopBaseINS_10BasicBlockENS_4LoopEE14getLoopLatchesERNS_15SmallVectorImplIPS1_EE at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm28getOptionalBoolLoopAttributeEPKNS_4LoopENS_9StringRefE at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm17LoopVectorizePass7runImplERNS_8FunctionE at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)
_ZN4llvm17LoopVectorizePass3runERNS_8FunctionERNS_15AnalysisManagerIS1_JEEE at D:\a\llvm_julia_tester\llvm_julia_tester\julia\usr\bin\libLLVM-21jl.dll (unknown line)

Hi @fhahn
We see that crash as well.
Reproduce with
opt -passes=loop-vectorize bbi-104312.ll -o /dev/null
bbi-104312.ll.gz

@fhahn
Copy link
Contributor Author

fhahn commented Feb 19, 2025

@mikaelholmen it should be fixed in a96444a. The assertion was over-eager and the related code actually unused.

@Zentrik it would be great if you could check if the issue is fixed. If not, would you be able to share a reproducer.

@mikaelholmen
Copy link
Collaborator

@mikaelholmen it should be fixed in a96444a. The assertion was over-eager and the related code actually unused.

Yep, thanks!

@mikaelholmen
Copy link
Collaborator

@mikaelholmen it should be fixed in a96444a. The assertion was over-eager and the related code actually unused.

Yep, thanks!

@fhahn: Hm, seems like I was too quick there with my verification :/
After the a96444a fix
opt -passes=loop-vectorize bbi-104312.ll -o /dev/null
crashes like this instead?

opt: ../lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp:445: void (anonymous namespace)::PlainCFGBuilder::buildPlainCFG(DenseMap<VPBlockBase *, BasicBlock *> &): Assertion `IRDef2VPValue.contains(BI->getCondition()) && "Missing condition bit in IRDef2VPValue!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/opt -passes=loop-vectorize bbi-104312.ll -o /dev/null
1.	Running pass "function(loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>)" on module "bbi-104312.ll"
2.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "main"
 #0 0x00005583f8bc71a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/opt+0x464e1a6)
 #1 0x00005583f8bc4bee llvm::sys::RunSignalHandlers() (build-all/bin/opt+0x464bbee)
 #2 0x00005583f8bc7a29 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f49ef09bd10 __restore_rt (/lib64/libpthread.so.0+0x12d10)
 #4 0x00007f49eca3b52f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f49eca0ee65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00007f49eca0ed39 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d39)
 #7 0x00007f49eca33e86 (/lib64/libc.so.6+0x46e86)
 #8 0x00005583fa23e345 (anonymous namespace)::PlainCFGBuilder::buildPlainCFG(llvm::DenseMap<llvm::VPBlockBase*, llvm::BasicBlock*, llvm::DenseMapInfo<llvm::VPBlockBase*, void>, llvm::detail::DenseMapPair<llvm::VPBlockBase*, llvm::BasicBlock*>>&) VPlanHCFGBuilder.cpp:0:0
 #9 0x00005583fa23b62d llvm::VPlanHCFGBuilder::buildPlainCFG() (build-all/bin/opt+0x5cc262d)
#10 0x00005583fa23e8af llvm::VPlanHCFGBuilder::buildHierarchicalCFG() (build-all/bin/opt+0x5cc58af)
#11 0x00005583fa18f5a1 llvm::LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(llvm::VFRange&) (build-all/bin/opt+0x5c165a1)
#12 0x00005583fa181813 llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(llvm::ElementCount, llvm::ElementCount) (build-all/bin/opt+0x5c08813)
#13 0x00005583fa1812e0 llvm::LoopVectorizationPlanner::plan(llvm::ElementCount, unsigned int) (build-all/bin/opt+0x5c082e0)
#14 0x00005583fa199636 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (build-all/bin/opt+0x5c20636)
#15 0x00005583fa1a074b llvm::LoopVectorizePass::runImpl(llvm::Function&) (build-all/bin/opt+0x5c2774b)
#16 0x00005583fa1a1006 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x5c28006)
#17 0x00005583fa04adad llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#18 0x00005583f8de9567 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x4870567)
#19 0x00005583fa04398d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#20 0x00005583f8dee13e llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x487513e)
#21 0x00005583fa03a07d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilderPipelines.cpp:0:0
#22 0x00005583f8de8257 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x486f257)
#23 0x00005583f9fc53ec llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (build-all/bin/opt+0x5a4c3ec)
#24 0x00005583f8b89d92 optMain (build-all/bin/opt+0x4610d92)
#25 0x00007f49eca277e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#26 0x00005583f8b879ae _start (build-all/bin/opt+0x460e9ae)
Abort (core dumped)

Tried this both on a96444a and latest trunk 8b58cb8.

@fhahn
Copy link
Contributor Author

fhahn commented Feb 20, 2025

@mikaelholmen the second issue should be fixed by 404af37

@mikaelholmen
Copy link
Collaborator

@mikaelholmen the second issue should be fixed by 404af37

Yep now it works. Thanks!

fhahn added a commit that referenced this pull request Feb 22, 2025
Use operands from VPInstructions directly during recipe creation.

Follow-up as discussed and planned after
#124432.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 22, 2025
…(NFC).

Use operands from VPInstructions directly during recipe creation.

Follow-up as discussed and planned after
llvm/llvm-project#124432.
fhahn added a commit that referenced this pull request Feb 23, 2025
Removes unneeded code after #124432.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants