Skip to content

[VPlan] Add VPExpressionRecipe, replacing extended reduction recipes. #144281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jul 1, 2025

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jun 15, 2025

This patch adds a new recipe to combine multiple recipes into an 'expression' recipe, which should be considered as single entity for cost-modeling and transforms. The recipe needs to be 'unbundled', i.e. replaced by its individual recipes before execute.

This subsumes VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe and should make it easier to extend to include more types of bundled patterns, like e.g. extends folded into loads or various arithmetic instructions, if supported by the target.

It allows avoiding re-creating the original recipes when converting to concrete recipes, together with removing the need to record various information. The current version of the patch still retains the original printing matching VPExtendedReductionRecipe and VPMulAccumulateReductionRecipe, but this specialized print could be replaced with printing the bundled recipes directly.

@llvmbot
Copy link
Member

llvmbot commented Jun 15, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

This patch adds a new recipe to combine multiple recipes into a 'bundle' recipe, which should be considered as single entity for cost-modeling and transforms. The recipe needs to be 'unbundled', i.e. replaced by its individual recipes before execute.

This subsumes VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe and should make it easier to extend to include more types of bundled patterns, like e.g. extends folded into loads or various arithmetic instructions, if supported by the target.

It allows avoiding re-creating the original recipes when converting to concrete recipes, together with removing the need to record various information. The current version of the patch still retains the original printing matching VPExtendedReductionRecipe and VPMulAccumulateReductionRecipe, but this specialized print could be replaced with printing the bundled recipes directly.

Currently the unbundle implementation is a bit more complicated than necessary, as we need to fold the extends across ops to match the current behavior, but there's quite possibly a better place to do so.


Patch is 45.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144281.diff

7 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+120-215)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+3-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+129-75)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+28-105)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll (+14-13)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+6-6)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 5a3c4a514a5dd..256706deb0977 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -525,14 +525,13 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
 
   static inline bool classof(const VPRecipeBase *R) {
     switch (R->getVPDefID()) {
+    case VPRecipeBase::VPBundleSC:
     case VPRecipeBase::VPDerivedIVSC:
     case VPRecipeBase::VPEVLBasedIVPHISC:
     case VPRecipeBase::VPExpandSCEVSC:
     case VPRecipeBase::VPInstructionSC:
     case VPRecipeBase::VPReductionEVLSC:
     case VPRecipeBase::VPReductionSC:
-    case VPRecipeBase::VPMulAccumulateReductionSC:
-    case VPRecipeBase::VPExtendedReductionSC:
     case VPRecipeBase::VPReplicateSC:
     case VPRecipeBase::VPScalarIVStepsSC:
     case VPRecipeBase::VPVectorPointerSC:
@@ -852,9 +851,7 @@ struct VPRecipeWithIRFlags : public VPSingleDefRecipe, public VPIRFlags {
            R->getVPDefID() == VPRecipeBase::VPReductionEVLSC ||
            R->getVPDefID() == VPRecipeBase::VPReplicateSC ||
            R->getVPDefID() == VPRecipeBase::VPVectorEndPointerSC ||
-           R->getVPDefID() == VPRecipeBase::VPVectorPointerSC ||
-           R->getVPDefID() == VPRecipeBase::VPExtendedReductionSC ||
-           R->getVPDefID() == VPRecipeBase::VPMulAccumulateReductionSC;
+           R->getVPDefID() == VPRecipeBase::VPVectorPointerSC;
   }
 
   static inline bool classof(const VPUser *U) {
@@ -2431,29 +2428,6 @@ class VPReductionRecipe : public VPRecipeWithIRFlags {
     }
     setUnderlyingValue(I);
   }
-
-  /// For VPExtendedReductionRecipe.
-  /// Note that the debug location is from the extend.
-  VPReductionRecipe(const unsigned char SC, const RecurKind RdxKind,
-                    ArrayRef<VPValue *> Operands, VPValue *CondOp,
-                    bool IsOrdered, DebugLoc DL)
-      : VPRecipeWithIRFlags(SC, Operands, DL), RdxKind(RdxKind),
-        IsOrdered(IsOrdered), IsConditional(CondOp) {
-    if (CondOp)
-      addOperand(CondOp);
-  }
-
-  /// For VPMulAccumulateReductionRecipe.
-  /// Note that the NUW/NSW flags and the debug location are from the Mul.
-  VPReductionRecipe(const unsigned char SC, const RecurKind RdxKind,
-                    ArrayRef<VPValue *> Operands, VPValue *CondOp,
-                    bool IsOrdered, WrapFlagsTy WrapFlags, DebugLoc DL)
-      : VPRecipeWithIRFlags(SC, Operands, WrapFlags, DL), RdxKind(RdxKind),
-        IsOrdered(IsOrdered), IsConditional(CondOp) {
-    if (CondOp)
-      addOperand(CondOp);
-  }
-
 public:
   VPReductionRecipe(RecurKind RdxKind, FastMathFlags FMFs, Instruction *I,
                     VPValue *ChainOp, VPValue *VecOp, VPValue *CondOp,
@@ -2479,9 +2453,7 @@ class VPReductionRecipe : public VPRecipeWithIRFlags {
 
   static inline bool classof(const VPRecipeBase *R) {
     return R->getVPDefID() == VPRecipeBase::VPReductionSC ||
-           R->getVPDefID() == VPRecipeBase::VPReductionEVLSC ||
-           R->getVPDefID() == VPRecipeBase::VPExtendedReductionSC ||
-           R->getVPDefID() == VPRecipeBase::VPMulAccumulateReductionSC;
+           R->getVPDefID() == VPRecipeBase::VPReductionEVLSC;
   }
 
   static inline bool classof(const VPUser *U) {
@@ -2620,190 +2592,6 @@ class VPReductionEVLRecipe : public VPReductionRecipe {
   }
 };
 
-/// A recipe to represent inloop extended reduction operations, performing a
-/// reduction on a extended vector operand into a scalar value, and adding the
-/// result to a chain. This recipe is abstract and needs to be lowered to
-/// concrete recipes before codegen. The operands are {ChainOp, VecOp,
-/// [Condition]}.
-class VPExtendedReductionRecipe : public VPReductionRecipe {
-  /// Opcode of the extend for VecOp.
-  Instruction::CastOps ExtOp;
-
-  /// The scalar type after extending.
-  Type *ResultTy;
-
-  /// For cloning VPExtendedReductionRecipe.
-  VPExtendedReductionRecipe(VPExtendedReductionRecipe *ExtRed)
-      : VPReductionRecipe(
-            VPDef::VPExtendedReductionSC, ExtRed->getRecurrenceKind(),
-            {ExtRed->getChainOp(), ExtRed->getVecOp()}, ExtRed->getCondOp(),
-            ExtRed->isOrdered(), ExtRed->getDebugLoc()),
-        ExtOp(ExtRed->getExtOpcode()), ResultTy(ExtRed->getResultType()) {
-    transferFlags(*ExtRed);
-    setUnderlyingValue(ExtRed->getUnderlyingValue());
-  }
-
-public:
-  VPExtendedReductionRecipe(VPReductionRecipe *R, VPWidenCastRecipe *Ext)
-      : VPReductionRecipe(VPDef::VPExtendedReductionSC, R->getRecurrenceKind(),
-                          {R->getChainOp(), Ext->getOperand(0)}, R->getCondOp(),
-                          R->isOrdered(), Ext->getDebugLoc()),
-        ExtOp(Ext->getOpcode()), ResultTy(Ext->getResultType()) {
-    assert((ExtOp == Instruction::CastOps::ZExt ||
-            ExtOp == Instruction::CastOps::SExt) &&
-           "VPExtendedReductionRecipe only supports zext and sext.");
-
-    transferFlags(*Ext);
-    setUnderlyingValue(R->getUnderlyingValue());
-  }
-
-  ~VPExtendedReductionRecipe() override = default;
-
-  VPExtendedReductionRecipe *clone() override {
-    return new VPExtendedReductionRecipe(this);
-  }
-
-  VP_CLASSOF_IMPL(VPDef::VPExtendedReductionSC);
-
-  void execute(VPTransformState &State) override {
-    llvm_unreachable("VPExtendedReductionRecipe should be transform to "
-                     "VPExtendedRecipe + VPReductionRecipe before execution.");
-  };
-
-  /// Return the cost of VPExtendedReductionRecipe.
-  InstructionCost computeCost(ElementCount VF,
-                              VPCostContext &Ctx) const override;
-
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-  /// Print the recipe.
-  void print(raw_ostream &O, const Twine &Indent,
-             VPSlotTracker &SlotTracker) const override;
-#endif
-
-  /// The scalar type after extending.
-  Type *getResultType() const { return ResultTy; }
-
-  /// Is the extend ZExt?
-  bool isZExt() const { return getExtOpcode() == Instruction::ZExt; }
-
-  /// Get the opcode of the extend for VecOp.
-  Instruction::CastOps getExtOpcode() const { return ExtOp; }
-};
-
-/// A recipe to represent inloop MulAccumulateReduction operations,  multiplying
-/// the vector operands (which may be extended), performing a reduction.add on
-/// the result, and adding the scalar result to a chain. This recipe is abstract
-/// and needs to be lowered to concrete recipes before codegen. The operands are
-/// {ChainOp, VecOp1, VecOp2, [Condition]}.
-class VPMulAccumulateReductionRecipe : public VPReductionRecipe {
-  /// Opcode of the extend for VecOp1 and VecOp2.
-  Instruction::CastOps ExtOp;
-
-  /// Non-neg flag of the extend recipe.
-  bool IsNonNeg = false;
-
-  /// The scalar type after extending.
-  Type *ResultTy = nullptr;
-
-  /// For cloning VPMulAccumulateReductionRecipe.
-  VPMulAccumulateReductionRecipe(VPMulAccumulateReductionRecipe *MulAcc)
-      : VPReductionRecipe(
-            VPDef::VPMulAccumulateReductionSC, MulAcc->getRecurrenceKind(),
-            {MulAcc->getChainOp(), MulAcc->getVecOp0(), MulAcc->getVecOp1()},
-            MulAcc->getCondOp(), MulAcc->isOrdered(),
-            WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), MulAcc->hasNoSignedWrap()),
-            MulAcc->getDebugLoc()),
-        ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()),
-        ResultTy(MulAcc->getResultType()) {
-    transferFlags(*MulAcc);
-    setUnderlyingValue(MulAcc->getUnderlyingValue());
-  }
-
-public:
-  VPMulAccumulateReductionRecipe(VPReductionRecipe *R, VPWidenRecipe *Mul,
-                                 VPWidenCastRecipe *Ext0,
-                                 VPWidenCastRecipe *Ext1, Type *ResultTy)
-      : VPReductionRecipe(
-            VPDef::VPMulAccumulateReductionSC, R->getRecurrenceKind(),
-            {R->getChainOp(), Ext0->getOperand(0), Ext1->getOperand(0)},
-            R->getCondOp(), R->isOrdered(),
-            WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
-            R->getDebugLoc()),
-        ExtOp(Ext0->getOpcode()), ResultTy(ResultTy) {
-    assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
-               Instruction::Add &&
-           "The reduction instruction in MulAccumulateteReductionRecipe must "
-           "be Add");
-    assert((ExtOp == Instruction::CastOps::ZExt ||
-            ExtOp == Instruction::CastOps::SExt) &&
-           "VPMulAccumulateReductionRecipe only supports zext and sext.");
-    setUnderlyingValue(R->getUnderlyingValue());
-    // Only set the non-negative flag if the original recipe contains.
-    if (Ext0->hasNonNegFlag())
-      IsNonNeg = Ext0->isNonNeg();
-  }
-
-  VPMulAccumulateReductionRecipe(VPReductionRecipe *R, VPWidenRecipe *Mul,
-                                 Type *ResultTy)
-      : VPReductionRecipe(
-            VPDef::VPMulAccumulateReductionSC, R->getRecurrenceKind(),
-            {R->getChainOp(), Mul->getOperand(0), Mul->getOperand(1)},
-            R->getCondOp(), R->isOrdered(),
-            WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
-            R->getDebugLoc()),
-        ExtOp(Instruction::CastOps::CastOpsEnd), ResultTy(ResultTy) {
-    assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
-               Instruction::Add &&
-           "The reduction instruction in MulAccumulateReductionRecipe must be "
-           "Add");
-    setUnderlyingValue(R->getUnderlyingValue());
-  }
-
-  ~VPMulAccumulateReductionRecipe() override = default;
-
-  VPMulAccumulateReductionRecipe *clone() override {
-    return new VPMulAccumulateReductionRecipe(this);
-  }
-
-  VP_CLASSOF_IMPL(VPDef::VPMulAccumulateReductionSC);
-
-  void execute(VPTransformState &State) override {
-    llvm_unreachable("VPMulAccumulateReductionRecipe should transform to "
-                     "VPWidenCastRecipe + "
-                     "VPWidenRecipe + VPReductionRecipe before execution");
-  }
-
-  /// Return the cost of VPMulAccumulateReductionRecipe.
-  InstructionCost computeCost(ElementCount VF,
-                              VPCostContext &Ctx) const override;
-
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-  /// Print the recipe.
-  void print(raw_ostream &O, const Twine &Indent,
-             VPSlotTracker &SlotTracker) const override;
-#endif
-
-  Type *getResultType() const { return ResultTy; }
-
-  /// The first vector value to be extended and reduced.
-  VPValue *getVecOp0() const { return getOperand(1); }
-
-  /// The second vector value to be extended and reduced.
-  VPValue *getVecOp1() const { return getOperand(2); }
-
-  /// Return true if this recipe contains extended operands.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
-
-  /// Return the opcode of the extends for the operands.
-  Instruction::CastOps getExtOpcode() const { return ExtOp; }
-
-  /// Return if the operands are zero-extended.
-  bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; }
-
-  /// Return true if the operand extends have the non-negative flag.
-  bool isNonNeg() const { return IsNonNeg; }
-};
-
 /// VPReplicateRecipe replicates a given instruction producing multiple scalar
 /// copies of the original scalar type, one per lane, instead of producing a
 /// single copy of widened type for all lanes. If the instruction is known to be
@@ -2922,6 +2710,123 @@ class VPBranchOnMaskRecipe : public VPRecipeBase {
   }
 };
 
+/// A recipe to combine multiple recipes into a 'bundle' recipe, which should be
+/// considered as single entity for cost-modeling and transforms. The recipe
+/// needs to be 'unbundled', i.e. replaced by its individual recipes before
+/// execute.
+class VPBundleRecipe : public VPSingleDefRecipe {
+  enum class BundleTypes {
+    ExtendedReduction,
+    MulAccumulateReduction,
+  };
+
+  /// Recipes bundled together in this VPBundleRecipe.
+  SmallVector<VPSingleDefRecipe *> BundledOps;
+
+  /// Temporary VPValues used for external operands of the bundle, i.e. operands
+  /// not defined by recipes in the bundle.
+  SmallVector<VPValue *> TmpValues;
+
+  /// Type of the bundle.
+  BundleTypes BundleType;
+
+  VPBundleRecipe(BundleTypes BundleType, ArrayRef<VPSingleDefRecipe *> ToBundle)
+      : VPSingleDefRecipe(VPDef::VPBundleSC, {}, {}), BundledOps(ToBundle),
+        BundleType(BundleType) {
+    // Bundle up the operand recipes.
+    SmallPtrSet<VPUser *, 4> BundledUsers;
+    for (auto *R : ToBundle)
+      BundledUsers.insert(R);
+
+    // Recipes in the bundle, expect the last one, must only be used inside the
+    // bundle. If there other external users, clone the recipes for the bundle.
+    for (const auto &[Idx, R] : enumerate(drop_end(ToBundle))) {
+      if (all_of(R->users(), [&BundledUsers](VPUser *U) {
+            return BundledUsers.contains(U);
+          })) {
+        if (R->getParent())
+          R->removeFromParent();
+        continue;
+      }
+      // There users external to the bundle. Clone the recipe for use in the
+      // bundle and update all its in-bundle users.
+      this->BundledOps[Idx] = R->clone();
+      BundledUsers.insert(this->BundledOps[Idx]);
+      R->replaceUsesWithIf(this->BundledOps[Idx],
+                           [&BundledUsers](VPUser &U, unsigned) {
+                             return BundledUsers.contains(&U);
+                           });
+    }
+    BundledOps.back()->removeFromParent();
+
+    // Internalize all external operands to the bundled operations. To do so,
+    // create new temporary VPValues for all operands not defined by recipe in
+    // the bundle. The original operands are added as operands of the
+    // VPBundleRecipe.
+    for (auto *R : this->BundledOps) {
+      for (const auto &[Idx, Op] : enumerate(R->operands())) {
+        auto *Def = Op->getDefiningRecipe();
+        if (Def && BundledUsers.contains(Def))
+          continue;
+        addOperand(Op);
+        TmpValues.push_back(new VPValue());
+        R->setOperand(Idx, TmpValues.back());
+      }
+    }
+  }
+
+public:
+  VPBundleRecipe(VPWidenCastRecipe *Ext, VPReductionRecipe *Red)
+      : VPBundleRecipe(BundleTypes::ExtendedReduction, {Ext, Red}) {}
+  VPBundleRecipe(VPWidenRecipe *Mul, VPReductionRecipe *Red)
+      : VPBundleRecipe(BundleTypes::MulAccumulateReduction, {Mul, Red}) {}
+  VPBundleRecipe(VPWidenCastRecipe *Ext0, VPWidenCastRecipe *Ext1,
+                 VPWidenRecipe *Mul, VPReductionRecipe *Red)
+      : VPBundleRecipe(BundleTypes::MulAccumulateReduction,
+                       {Ext0, Ext1, Mul, Red}) {}
+  VPBundleRecipe(VPWidenCastRecipe *Ext0, VPWidenCastRecipe *Ext1,
+                 VPWidenRecipe *Mul, VPWidenCastRecipe *Ext2,
+                 VPReductionRecipe *Red)
+      : VPBundleRecipe(BundleTypes::MulAccumulateReduction,
+                       {Ext0, Ext1, Mul, Ext2, Red}) {}
+
+  ~VPBundleRecipe() override {
+    SmallPtrSet<VPRecipeBase *, 4> Seen;
+    for (auto *R : reverse(BundledOps))
+      if (Seen.insert(R).second)
+        delete R;
+    for (VPValue *T : TmpValues)
+      delete T;
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPBundleSC)
+
+  VPBundleRecipe *clone() override {
+    return new VPBundleRecipe(BundleType, BundledOps);
+  }
+
+  /// Return the VPSingleDefRecipe producing the final result of the bundled
+  /// recipe.
+  VPSingleDefRecipe *getResultOp() const { return BundledOps.back(); }
+
+  void unbundle();
+
+  /// Generate the extraction of the appropriate bit from the block mask and the
+  /// conditional branch.
+  void execute(VPTransformState &State) override {
+    llvm_unreachable("recipe must be removed before execute");
+  }
+
+  InstructionCost computeCost(ElementCount VF,
+                              VPCostContext &Ctx) const override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
 /// VPPredInstPHIRecipe is a recipe for generating the phi nodes needed when
 /// control converges back from a Branch-on-Mask. The phi nodes are needed in
 /// order to merge values that are set under such a branch and feed their uses.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 76da5b0314a8e..c8336e7b3f92c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -267,6 +267,9 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
 
   Type *ResultTy =
       TypeSwitch<const VPRecipeBase *, Type *>(V->getDefiningRecipe())
+          .Case<VPBundleRecipe>([this](const auto *R) {
+            return inferScalarType(R->getOperand(R->getNumOperands() - 2));
+          })
           .Case<VPActiveLaneMaskPHIRecipe, VPCanonicalIVPHIRecipe,
                 VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe,
                 VPWidenPointerInductionRecipe, VPEVLBasedIVPHIRecipe>(
@@ -296,8 +299,6 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
             // TODO: Use info from interleave group.
             return V->getUnderlyingValue()->getType();
           })
-          .Case<VPExtendedReductionRecipe, VPMulAccumulateReductionRecipe>(
-              [](const auto *R) { return R->getResultType(); })
           .Case<VPExpandSCEVRecipe>([](const VPExpandSCEVRecipe *R) {
             return R->getSCEV()->getType();
           })
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 3bdfa6724f691..1820e73bd7b59 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -72,8 +72,6 @@ bool VPRecipeBase::mayWriteToMemory() const {
   case VPBlendSC:
   case VPReductionEVLSC:
   case VPReductionSC:
-  case VPExtendedReductionSC:
-  case VPMulAccumulateReductionSC:
   case VPVectorPointerSC:
   case VPWidenCanonicalIVSC:
   case VPWidenCastSC:
@@ -121,8 +119,6 @@ bool VPRecipeBase::mayReadFromMemory() const {
   case VPBlendSC:
   case VPReductionEVLSC:
   case VPReductionSC:
-  case VPExtendedReductionSC:
-  case VPMulAccumulateReductionSC:
   case VPVectorPointerSC:
   case VPWidenCanonicalIVSC:
   case VPWidenCastSC:
@@ -160,8 +156,6 @@ bool VPRecipeBase::mayHaveSideEffects() const {
   case VPBlendSC:
   case VPReductionEVLSC:
   case VPReductionSC:
-  case VPExtendedReductionSC:
-  case VPMulAccumulateReductionSC:
   case VPScalarIVStepsSC:
   case VPVectorPointerSC:
   case VPWidenCanonicalIVSC:
@@ -2575,30 +2569,142 @@ InstructionCost VPReductionRecipe::computeCost(ElementCount VF,
                                             Ctx.CostKind);
 }
 
-InstructionCost
-VPExtendedReductionRecipe::computeCost(ElementCount VF,
-                                       VPCostContext &Ctx) const {
-  unsigned Opcode = RecurrenceDescriptor::getOpcode(getRecurrenceKind());
-  Type *RedTy = Ctx.Types.inferScalarType(this);
-  auto *SrcVecTy =
-      cast<VectorType>(toVectorTy(Ctx.Types.inferScalarType(getVecOp()), VF));
-  assert(RedTy->isIntegerTy() &&
-         "ExtendedReduction only support integer type currently.");
-  return Ctx.TTI.getExtendedReductionCost(Opcode, isZExt(), RedTy, SrcVecTy,
-                                          std::nullopt, Ctx.CostKind);
+void VPBundleRecipe::unbundle() {
+  for (auto *Op : BundledOps)
+    if (!Op->getParent())
+      Op->insertBefore(this);
+
+  for (const auto &[Idx, Op] : enumerate(operands()))
+    TmpValues[Idx]->replaceAllUsesWith(Op);
+
+  replaceAllUsesWith(getResultOp());
+
+  if (BundleType == BundleTypes::MulAccumulateReduction &&
+      BundledOps.size() == 5) {
+    // Note that we will drop the extend after mul which transforms
+    // reduce.add(ext(mul(ext, ext))) to reduce.add(mul(ext, ext)).
+    auto *Ext0 = cast<VPWidenCastRecipe>(BundledOps[0]);
+    auto *Ext1 = cast<VPWidenCastRecipe>(BundledOps[1]);
+    auto *Ext2 = cast<VPWidenCastRecipe>(BundledOps[3]);
+    auto *Op0 =
+        new VPWidenCastRecipe(Ext0->getOpcode(), Ext0->getOperand(0),
+                              Ext2->getResultType(), *Ext0, getDebugLoc());
+    Op0->insertBefore(Ext0);
+
+    VPSingleDefRecipe *Op1 = Op0;
+    if (Ext0 != Ext1) {
+      Op1 = new VPWidenCastRecipe(Ext1->getOpcode(), Ext1->getOperand(0),
+                                  Ext2->getResultType(), *Ext1, getDebugLoc());
+      Op1->insertBefore(Ext...
[truncated]

Copy link

github-actions bot commented Jun 15, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines 270 to 271
.Case<VPBundleRecipe>([this](const auto *R) {
return inferScalarType(R->getOperand(R->getNumOperands() - 2));
Copy link
Contributor

@ElvisWang123 ElvisWang123 Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the type here is not correct and this cause the cost of the extended-reduction changes.

The Reduction recipe is the last recipes so the operands in it will be last two (or three if optional) ops. But the vecOp is already defined in the bundle (at least for mul-accumulate reduction and extended-reduction). So the ChainOp which contains the reduction type will be the -1 (or -2) operand of the VPBundle recipe.

I think with this fix the cost of extended-reduction will be correct and this patch can be NFC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that was missed in the update, updated to check if there's a conditional reduction, thanks.

There's one change, where we don't re-use an existing sext in @mla_and_add_together_16_64

Copy link
Collaborator

@SamTebbs33 SamTebbs33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good idea.

for (auto *R : ToBundle)
BundledUsers.insert(R);

// Recipes in the bundle, expect the last one, must only be used inside the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect -> except

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed thanks

R->removeFromParent();
continue;
}
// There users external to the bundle. Clone the recipe for use in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There -> The ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed thanks

}
// There users external to the bundle. Clone the recipe for use in the
// bundle and update all its in-bundle users.
this->BundledOps[Idx] = R->clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filling the array by the enumerated Idx will leave gaps in the array if we continue above. I think we'll want a test case for bundling recipes with external users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BundleOps at this point contains all recipes from ToBundle, here we update existing entries with external users and replace them with a clone that is just used in a bundle.

There is existing test coverage in Transforms/LoopVectorize/ARM/mve-reductions.ll and Transforms/LoopVectorize/reduction-inloop.ll

// bundle and update all its in-bundle users.
this->BundledOps[Idx] = R->clone();
BundledUsers.insert(this->BundledOps[Idx]);
R->replaceUsesWithIf(this->BundledOps[Idx],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BundledOps[Idx] is used many times so a temporary variable for the clone would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

/// recipe.
VPSingleDefRecipe *getResultOp() const { return BundledOps.back(); }

void unbundle();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added thanks.

for (auto *R : ToBundle)
BundledUsers.insert(R);

// Recipes in the bundle, expect the last one, must only be used inside the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this should be except the last one

// There users external to the bundle. Clone the recipe for use in the
// bundle and update all its in-bundle users.
this->BundledOps[Idx] = R->clone();
BundledUsers.insert(this->BundledOps[Idx]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this also keeping the original R in BundledUsers too so that you've got two copies from the same recipe? Do we need to delete the original?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BundledUsers set is only used to check if the users are in the bundle or not. After cloning, we updated all users of the original to use the copy, so leaving it in here should be fine (if its operands are part of the bundle and had other users, then they themselves would already have been cloned)

VP_CLASSOF_IMPL(VPDef::VPBundleSC)

VPBundleRecipe *clone() override {
return new VPBundleRecipe(BundleType, BundledOps);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we call this after calling unbundle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't happen, I added an assert.

if (Ext0 != Ext1) {
Op1 = new VPWidenCastRecipe(Ext1->getOpcode(), Ext1->getOperand(0),
Ext2->getResultType(), *Ext1, getDebugLoc());
Op1->insertBefore(Ext0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be Op1->insertBefore(Ext1);?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks. Ideally the whole transform here would be done separately, added a TODO

fhahn added 2 commits June 17, 2025 11:57
This patch adds a new recipe to combine multiple recipes into a 'bundle'
recipe, which should be considered as single entity for cost-modeling and
transforms. The recipe needs to be 'unbundled', i.e. replaced by its
individual recipes before execute.

This subsumes VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe and should make it easier to extend to
include more types of bundled patterns, like e.g. extends folded into
loads or various arithmetic instructions, if supported by the target.

It allows avoiding re-creating the original recipes when converting to
concrete recipes, together with removing the need to record various
information. The current version of the patch still retains the original
printing matching VPExtendedReductionRecipe and VPMulAccumulateReductionRecipe,
but this specialized print could be replaced with printing the bundled recipes
directly.

Currently the unbundle implementation is a bit more complicated than
necessary, as we need to fold the extends across ops to match the
current behavior, but there's quite possibly a better place to do so.
return BundledUsers.contains(&U);
});
}
BundledOps.back()->removeFromParent();
Copy link
Collaborator

@SamTebbs33 SamTebbs33 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While experimenting with rebasing #136173 on top of this, I've found that the new bundle recipe can't be used when interleaving. This line for example needs to be gated behind if (BundledOps.back().getParent()) since the second time the bundle is created for the same reduction recipe, it's already been removed and this causes an assertion failure. Would you mind seeing if you can get this recipe to work for interleaved reductions?

EDIT: I've done some more digging and it could actually be an issue specific to partial reductions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we were missing some test coverage for interleaving VPExtendedReductions/VPMulAccumulateReductions.

The issue was that we didn't clone the recipes in the bundle when cloning the VPBundleRecipe. Updated and now it should work fine with interleaving (tested in llvm/test/Transforms/LoopVectorize/ARM/mve-reductions-interleave.ll)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for that.

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to consider VPWidenIntOrFpInductionRecipe a type of VPBundleRecipe? If so I presume we would still want it to also be a VPWidenPHIRecipe which has its own subclass, which would conflict with VPBundleSC. In which case we would need some form of multiple inheritance instead?

fhahn added a commit that referenced this pull request Jun 19, 2025
Add missing test coverage for interleaving with
VPExtendedReduction/VPMulAccumulateReduction recipes.

Adds missing test coverage in preparation for
#144281.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 19, 2025
…te reductions.

Add missing test coverage for interleaving with
VPExtendedReduction/VPMulAccumulateReduction recipes.

Adds missing test coverage in preparation for
llvm/llvm-project#144281.
Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice improvement and should make it easier to add new bundles in the future. You addressed the previous issue we found (as mentioned by @SamTebbs33) and I couldn't spot any more issues, so I'm happy to accept it.

It would also be nice for this to land so we can rebase 144908, which @SamTebbs33 told me would make some of the changes in that PR a bit simpler.

In any case, LGTM!

case BundleTypes::ExtendedReduction: {
unsigned Opcode = RecurrenceDescriptor::getOpcode(
cast<VPReductionRecipe>(BundledRecipes[1])->getRecurrenceKind());
return Ctx.TTI.getExtendedReductionCost(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to fix up in this PR, but I have noticed that the select that is inserted when the reduction is predicated, is not modelled. However, a VPSelect at the moment also returns a cost of 0 which seems wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this could be improved further. For this patch, I tried to keep it as close to NFC as possible

auto *SrcVecTy = cast<VectorType>(
toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
assert(RedTy->isIntegerTy() &&
"ExtendedReduction only support integer type currently.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say VPBundleRecipe rather than ExtendedReduction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, should be fixed, thanks

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-approval review.

@@ -2917,6 +2706,108 @@ class VPBranchOnMaskRecipe : public VPRecipeBase {
}
};

/// A recipe to combine multiple recipes into a 'bundle' recipe, which should be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A recipe to combine multiple recipes into a 'bundle' recipe, which should be
/// A recipe to combine multiple 'bundled' recipes into one 'bundle' recipe, which should be

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recipes aren't bundled until they're in the recipe so I don't think saying "multiple 'bundled' recipes" is correct. The original comment is correct in that the recipes are separate until they're bundled into this class.

@@ -2917,6 +2706,108 @@ class VPBranchOnMaskRecipe : public VPRecipeBase {
}
};

/// A recipe to combine multiple recipes into a 'bundle' recipe, which should be
/// considered as single entity for cost-modeling and transforms. The recipe
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// considered as single entity for cost-modeling and transforms. The recipe
/// considered a single entity for cost-modeling and transforms. The recipe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

@@ -2917,6 +2706,108 @@ class VPBranchOnMaskRecipe : public VPRecipeBase {
}
};

/// A recipe to combine multiple recipes into a 'bundle' recipe, which should be
/// considered as single entity for cost-modeling and transforms. The recipe
/// needs to be 'unbundled', i.e. replaced by its individual recipes before
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// needs to be 'unbundled', i.e. replaced by its individual recipes before
/// needs to be 'unbundled', i.e. replaced by its bundled recipes before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

/// A recipe to combine multiple recipes into a 'bundle' recipe, which should be
/// considered as single entity for cost-modeling and transforms. The recipe
/// needs to be 'unbundled', i.e. replaced by its individual recipes before
/// execute. The bundled recipes are completely connected from the def-use graph
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// execute. The bundled recipes are completely connected from the def-use graph
/// execute. The bundled recipes are completely disconnected from the def-use graph

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

Comment on lines 2713 to 2715
/// outside the bundled recipes. Operands not defined by recipes in the bundle
/// are added as operands of the VPBundleRecipe and the users of the result
/// recipe must be updated to use the VPBundleRecipe.
Copy link
Collaborator

@ayalz ayalz Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// outside the bundled recipes. Operands not defined by recipes in the bundle
/// are added as operands of the VPBundleRecipe and the users of the result
/// recipe must be updated to use the VPBundleRecipe.
/// of other, non-bundled recipes. Def-use edges between pairs of bundled recipes remain intact, whereas every edge between a bundled and a non-bundled recipe is elevated to connect the non-bundled recipe with the VPBundleRecipe itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

BundledRecipes.back()->removeFromParent();

// Internalize all external operands to the bundled operations. To do so,
// create new temporary VPValues for all operands not defined by recipe in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// create new temporary VPValues for all operands not defined by recipe in
// create new temporary VPValues for all operands defined by a recipe outside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks.

// Internalize all external operands to the bundled operations. To do so,
// create new temporary VPValues for all operands not defined by recipe in
// the bundle. The original operands are added as operands of the
// VPBundleRecipe.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// VPBundleRecipe.
// VPBundleRecipe itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks

auto *Def = Op->getDefiningRecipe();
if (Def && BundledUsers.contains(Def))
continue;
if (Operands.empty())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Operands.empty())
// If all Operands are already provided, use them.
if (Operands.empty())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks

continue;
if (Operands.empty())
addOperand(Op);
else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Operands really useful, simpler to always add Op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed for cloning, remoed the operand and handled instead by updating the placeholder values with the original operands for the cloned recipes.

VPMulAccumulateReductionRecipe::computeCost(ElementCount VF,
void VPBundleRecipe::unbundle() {
for (auto *R : BundledRecipes)
if (!R->getParent())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can R have a parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously yes, as the same recipe could be contained multiple times, De-duplicated in the latest version and removed code here, thanks

@fhahn fhahn changed the title [VPlan] Add VPBundleRecipe, replacing extended reduction recipes. [VPlan] Add VPSingleDefBundleRecipe, replacing extended reduction recipes. Jun 26, 2025
SamTebbs33 added a commit to SamTebbs33/llvm-project that referenced this pull request Jun 27, 2025
Partial reductions can easily be represented by the VPReductionRecipe
class by setting their scale factor to something greater than 1. This PR
merges the two together and gives VPReductionRecipe a VFScaleFactor so
that it can choose to generate the partial reduction intrinsic at
execute time.

Depends on llvm#144281
@@ -525,14 +525,13 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {

static inline bool classof(const VPRecipeBase *R) {
switch (R->getVPDefID()) {
case VPRecipeBase::VPBundleSC:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case VPRecipeBase::VPBundleSC:
case VPRecipeBase::VPSingleDefBundleSC:

to be consistent, and place below following lex order.

But would VPExpressionRecipe be better - conveying the notion that (a) it produces a single value (b) out of multiple simpler components - which are potentially side-effect-free(?) and potentially (sub)expressions themselves(?). I.e., a bundle recipe can potentially be bundled inside another bundle recipe, unbundle() can be recursive. Worth commenting that this may be possible but that currently bundle recipes are restricted to flat/single-level bundling.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is VPIdiomRecipe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated to VPExpressionRecipe, which is also slightly more compact. Tried to update various places refering to bundle as appropriate, just keeping unbundle.

}

/// Return the VPValue to use to infer the result type of the recipe.
VPValue *getTypeVPValue() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPValue *getTypeVPValue() const {
VPValue *getOperandOfResultType() const {

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

Comment on lines 2806 to 2807
unsigned OpIdx =
cast<VPReductionRecipe>(BundledRecipes.back())->isConditional() ? 2 : 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last operand unless it's a mask in which case penultimate operand? Should the bundled recipe itself support isConditional()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, as this is the only place where it is needed. Left as-is for now.

Comment on lines 2680 to 2681
return any_of(BundledRecipes,
[](VPSingleDefRecipe *R) { return R->mayHaveSideEffects(); });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to clone recipes in order to bundle them, if they have side effects, or may write to memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I added an assert here and. the constructor to disallow recieps with side-effects, thanks

Comment on lines 2944 to 2946
if (IsMulAccValidAndClampRange(true, Mul, nullptr, nullptr, nullptr)) {
return new VPSingleDefBundleRecipe(Mul, Red);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (IsMulAccValidAndClampRange(true, Mul, nullptr, nullptr, nullptr)) {
return new VPSingleDefBundleRecipe(Mul, Red);
}
if (IsMulAccValidAndClampRange(true, Mul, nullptr, nullptr, nullptr))
return new VPSingleDefBundleRecipe(Mul, Red);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks

/// non-bundled recipe is elevated to connect the non-bundled recipe with the
/// VPSingleDefBundleRecipe itself.
class VPSingleDefBundleRecipe : public VPSingleDefRecipe {
/// Recipes bundled together in this VPSingleDefBundleRecipe.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Recipes bundled together in this VPSingleDefBundleRecipe.
/// Recipes bundled together in this VPSingleDefBundleRecipe.
/// Note that recipes may appear multiple times, e.g., when both extends of a ExtMulAccumulateReduction are the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are unique in the latest version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good, worth noting?

/// Represent an inloop multiply-accumulate reduction, multiplying the
/// extended vector operands, performing a reduction.add on the result, and
/// adding the scalar result to a chain.
ExtMulAccumulateReduction,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ExtMulAccumulateReduction,
ExtMulAccReduction,

(also aligning with TTI API)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

/// Represent an inloop multiply-accumulate reduction, multiplying the
/// vector operands, performing a reduction.add on the result, and adding
/// the scalar result to a chain.
MulAccumulateReduction,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MulAccumulateReduction,
MulAccReduction,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

: VPSingleDefBundleRecipe(BundleTypes::ExtendedReduction, {Ext, Red}) {}
VPSingleDefBundleRecipe(VPWidenRecipe *Mul, VPReductionRecipe *Red)
: VPSingleDefBundleRecipe(BundleTypes::MulAccumulateReduction,
{Mul, Red}) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth asserting that Red is an add reduction, aka accumulation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently VPReductionRecipe will always accumulate/add the result to a reduction chain.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then perhaps better rename it VPAccumulationRecipe, or VPInLoopAccRecipe, independently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a use case soon that won't be an in-loop reduction so it's probably worth keeping the more generally name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if VPReductionRecipe will still be confined to add reductions, in loop or not, better rename it VPAccumulationRecipe? And/or add asserts that expressions named and documented as doing in-loop accumulations use suitable reduction recipes.

VPSingleDefBundleRecipe(VPWidenCastRecipe *Ext0, VPWidenCastRecipe *Ext1,
VPWidenRecipe *Mul, VPReductionRecipe *Red)
: VPSingleDefBundleRecipe(BundleTypes::ExtMulAccumulateReduction,
{Ext0, Ext1, Mul, Red}) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth asserting that Red is an add reduction, aka accumulation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently VPReductionRecipe will always accumulate/add the result to a reduction chain.

@fhahn fhahn changed the title [VPlan] Add VPSingleDefBundleRecipe, replacing extended reduction recipes. [VPlan] Add VPExpressionRecipe, replacing extended reduction recipes. Jul 1, 2025
@fhahn fhahn merged commit 6b3d2b6 into llvm:main Jul 1, 2025
5 of 7 checks passed
@fhahn fhahn deleted the vpbundlerecipe branch July 1, 2025 19:44
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 1, 2025

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/22449

Here is the relevant piece of the build log for the reference
Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure)
...
[57/59] Linking CXX executable External/HIP/cmath-hip-6.3.0
[58/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o
[59/59] Linking CXX executable External/HIP/TheNextWeek-hip-6.3.0
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja check-hip-simple
[0/1] cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP && /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/llvm-lit -sv array-hip-6.3.0.test empty-hip-6.3.0.test with-fopenmp-hip-6.3.0.test saxpy-hip-6.3.0.test memmove-hip-6.3.0.test split-kernel-args-hip-6.3.0.test builtin-logb-scalbn-hip-6.3.0.test TheNextWeek-hip-6.3.0.test algorithm-hip-6.3.0.test cmath-hip-6.3.0.test complex-hip-6.3.0.test math_h-hip-6.3.0.test new-hip-6.3.0.test blender.test
-- Testing: 14 tests, 14 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: test-suite :: External/HIP/blender.test (14 of 14)
******************** TEST 'test-suite :: External/HIP/blender.test' FAILED ********************

/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out --redirect-input /dev/null --summary /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.time /bin/bash test_blender.sh
/bin/bash verify_blender.sh /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out
Begin Blender test.
TEST_SUITE_HIP_ROOT=/opt/botworker/llvm/External/hip
Render /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend
Blender 4.1.1 (hash e1743a0317bc built 2024-04-15 23:47:45)
Read blend: "/opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend"
Could not open as Ogawa file from provided streams.
Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
Could not open as Ogawa file from provided streams.
Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
I0701 19:53:33.770823 3435281 device.cpp:39] HIPEW initialization succeeded
I0701 19:53:33.772928 3435281 device.cpp:45] Found HIPCC hipcc
I0701 19:53:33.834712 3435281 device.cpp:207] Device has compute preemption or is not used for display.
I0701 19:53:33.834729 3435281 device.cpp:211] Added device "" with id "HIP__0000:a3:00".
I0701 19:53:33.834798 3435281 device.cpp:568] Mapped host memory limit set to 536,444,985,344 bytes. (499.60G)
I0701 19:53:33.835031 3435281 device_impl.cpp:63] Using AVX2 CPU kernels.
Fra:1 Mem:524.00M (Peak 524.70M) | Time:00:00.56 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Eyepiece_rim
Fra:1 Mem:524.00M (Peak 524.70M) | Time:00:00.56 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.016
Fra:1 Mem:524.05M (Peak 524.70M) | Time:00:00.56 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.020
Fra:1 Mem:524.14M (Peak 524.70M) | Time:00:00.56 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Hoses.003
Fra:1 Mem:531.55M (Peak 531.55M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.001
Fra:1 Mem:531.54M (Peak 531.55M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.004
Fra:1 Mem:531.65M (Peak 531.65M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.008
Fra:1 Mem:531.97M (Peak 531.97M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.011
Fra:1 Mem:532.42M (Peak 532.42M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.012
Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure)
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja check-hip-simple
[0/1] cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP && /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/llvm-lit -sv array-hip-6.3.0.test empty-hip-6.3.0.test with-fopenmp-hip-6.3.0.test saxpy-hip-6.3.0.test memmove-hip-6.3.0.test split-kernel-args-hip-6.3.0.test builtin-logb-scalbn-hip-6.3.0.test TheNextWeek-hip-6.3.0.test algorithm-hip-6.3.0.test cmath-hip-6.3.0.test complex-hip-6.3.0.test math_h-hip-6.3.0.test new-hip-6.3.0.test blender.test
-- Testing: 14 tests, 14 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: test-suite :: External/HIP/blender.test (14 of 14)
******************** TEST 'test-suite :: External/HIP/blender.test' FAILED ********************

/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out --redirect-input /dev/null --summary /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.time /bin/bash test_blender.sh
/bin/bash verify_blender.sh /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out
Begin Blender test.
TEST_SUITE_HIP_ROOT=/opt/botworker/llvm/External/hip
Render /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend
Blender 4.1.1 (hash e1743a0317bc built 2024-04-15 23:47:45)
Read blend: "/opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend"
Could not open as Ogawa file from provided streams.
Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
Could not open as Ogawa file from provided streams.
Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
I0701 19:53:33.770823 3435281 device.cpp:39] HIPEW initialization succeeded
I0701 19:53:33.772928 3435281 device.cpp:45] Found HIPCC hipcc
I0701 19:53:33.834712 3435281 device.cpp:207] Device has compute preemption or is not used for display.
I0701 19:53:33.834729 3435281 device.cpp:211] Added device "" with id "HIP__0000:a3:00".
I0701 19:53:33.834798 3435281 device.cpp:568] Mapped host memory limit set to 536,444,985,344 bytes. (499.60G)
I0701 19:53:33.835031 3435281 device_impl.cpp:63] Using AVX2 CPU kernels.
Fra:1 Mem:524.00M (Peak 524.70M) | Time:00:00.56 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Eyepiece_rim
Fra:1 Mem:524.00M (Peak 524.70M) | Time:00:00.56 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.016
Fra:1 Mem:524.05M (Peak 524.70M) | Time:00:00.56 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.020
Fra:1 Mem:524.14M (Peak 524.70M) | Time:00:00.56 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Hoses.003
Fra:1 Mem:531.55M (Peak 531.55M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.001
Fra:1 Mem:531.54M (Peak 531.55M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.004
Fra:1 Mem:531.65M (Peak 531.65M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.008
Fra:1 Mem:531.97M (Peak 531.97M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.011
Fra:1 Mem:532.42M (Peak 532.42M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.012
Fra:1 Mem:533.38M (Peak 533.39M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | Cylinder.120
Fra:1 Mem:533.68M (Peak 533.68M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Wires
Fra:1 Mem:534.48M (Peak 534.48M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Another_weaponThingie
Fra:1 Mem:534.53M (Peak 534.53M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Chest_Connector
Fra:1 Mem:537.54M (Peak 537.54M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_wires
Fra:1 Mem:538.29M (Peak 538.29M) | Time:00:00.57 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Weapon_thingie

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 1, 2025
…on recipes. (#144281)

This patch adds a new recipe to combine multiple recipes into an
'expression' recipe, which should be considered as single entity for
cost-modeling and transforms. The recipe needs to be 'decomposed', i.e.
replaced by its individual recipes before execute.

This subsumes VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe and should make it easier to extend to
include more types of bundled patterns, like e.g. extends folded into
loads or various arithmetic instructions, if supported by the target.

It allows avoiding re-creating the original recipes when converting to
concrete recipes, together with removing the need to record various
information. The current version of the patch still retains the original
printing matching VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe, but this specialized print could be
replaced with printing the bundled recipes directly.

PR: llvm/llvm-project#144281
Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post commit... (right window)

@@ -525,14 +525,13 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {

static inline bool classof(const VPRecipeBase *R) {
switch (R->getVPDefID()) {
case VPRecipeBase::VPExpressionSC:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be placed below in lex order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fixed in the latest/landed version I think


enum class ExpressionTypes {
/// Represents an inloop extended reduction operation, performing a
/// reduction on a extended vector operand into a scalar value, and adding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// reduction on a extended vector operand into a scalar value, and adding
/// reduction on an extended vector operand into a scalar value, and adding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fixed in the latest/landed version I think

: VPSingleDefBundleRecipe(BundleTypes::ExtendedReduction, {Ext, Red}) {}
VPSingleDefBundleRecipe(VPWidenRecipe *Mul, VPReductionRecipe *Red)
: VPSingleDefBundleRecipe(BundleTypes::MulAccumulateReduction,
{Mul, Red}) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then perhaps better rename it VPAccumulationRecipe, or VPInLoopAccRecipe, independently.

/// non-bundled recipe is elevated to connect the non-bundled recipe with the
/// VPSingleDefBundleRecipe itself.
class VPSingleDefBundleRecipe : public VPSingleDefRecipe {
/// Recipes bundled together in this VPSingleDefBundleRecipe.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good, worth noting?

ExpressionTypes ExpressionType,
ArrayRef<VPSingleDefRecipe *> ExpressionRecipes)
: VPSingleDefRecipe(VPDef::VPExpressionSC, {}, {}),
ExpressionRecipes(SetVector<VPSingleDefRecipe *>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SetVector takes care of uniquifying the expression recipes? Worth a note. Perhaps worth having VPExpressionRecipe hold its expression recipes in a SetVector?

/// Insert the recipes of the expression back into the VPlan, directly before
/// the current recipe. Leaves the expression recipe empty, which must be
/// removed before codegen.
void decompose();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another name may be dissolve - as in dissolveLoopRegions(), also conveying that they should then be removed.

Compare with expand as in expandVPWidenIntOrFpInduction() which also takes place during convertToConcreteRecipes() but involves placing new recipes in several distinct places (i.e., is not an expression, and the function does more than dissolve a recipe or region in place), and materialize as in materializeBroadcasts() which introduces new recipes rather than replace abstract ones with concrete ones (concretize() them?).

Comment on lines +2580 to +2583
// Maintain a copy of the expression recipes as a set of users.
SmallPtrSet<VPUser *, 4> ExpressionRecipesAsSetOfUsers;
for (auto *R : ExpressionRecipes)
ExpressionRecipesAsSetOfUsers.insert(R);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this copy still needed, if VPExpressionRecipe holds its expression recipes in a SetVector?

Comment on lines 2635 to +2637
Type *RedTy = Ctx.Types.inferScalarType(this);
auto *SrcVecTy =
cast<VectorType>(toVectorTy(Ctx.Types.inferScalarType(getVecOp()), VF));
auto *SrcVecTy = cast<VectorType>(
toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) Better assert right after computing RedTy:

  auto *SrcVecTy = cast<VectorType>(
      toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
  Type *RedTy = Ctx.Types.inferScalarType(this);

Comment on lines +2642 to +2643
unsigned Opcode = RecurrenceDescriptor::getOpcode(
cast<VPReductionRecipe>(ExpressionRecipes[1])->getRecurrenceKind());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned Opcode = RecurrenceDescriptor::getOpcode(
cast<VPReductionRecipe>(ExpressionRecipes[1])->getRecurrenceKind());
unsigned Opcode = RecurrenceDescriptor::getOpcode(
cast<VPReductionRecipe>(ExpressionRecipes[1])->getRecurrenceKind());
bool IsZExt = cast<VPWidenCastRecipe>(ExpressionRecipes.front())->getOpcode() == Instruction::ZExt;

return Ctx.TTI.getMulAccReductionCost(false, RedTy, SrcVecTy, Ctx.CostKind);

case ExpressionTypes::ExtMulAccReduction:
return Ctx.TTI.getMulAccReductionCost(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Ctx.TTI.getMulAccReductionCost(
bool IsZExt = cast<VPWidenCastRecipe>(ExpressionRecipes.front())->getOpcode() == Instruction::ZExt;
return Ctx.TTI.getMulAccReductionCost(

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional comments, will follow-up tomorrow or on Thursday

: VPSingleDefBundleRecipe(BundleTypes::ExtendedReduction, {Ext, Red}) {}
VPSingleDefBundleRecipe(VPWidenRecipe *Mul, VPReductionRecipe *Red)
: VPSingleDefBundleRecipe(BundleTypes::MulAccumulateReduction,
{Mul, Red}) {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a use case soon that won't be an in-loop reduction so it's probably worth keeping the more generally name.

@Mel-Chen
Copy link
Contributor

Mel-Chen commented Jul 2, 2025

Got warning

llvm-project/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp: In member function ‘virtual llvm::InstructionCost llvm::VPExpressionRecipe::computeCost(llvm::ElementCount, llvm::VPCostContext&) const’:
llvm-project/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:2667:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^

assert(!ExpressionRecipes.empty() && "Nothing to combine?");
assert(
none_of(ExpressionRecipes,
[](VPSingleDefRecipe *R) { return R->mayHaveSideEffects(); }) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nor is R a VPExpressionRecipe?

@fhahn
Copy link
Contributor Author

fhahn commented Jul 3, 2025

Got warning

llvm-project/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp: In member function ‘virtual llvm::InstructionCost llvm::VPExpressionRecipe::computeCost(llvm::ElementCount, llvm::VPCostContext&) const’:
llvm-project/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:2667:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^

Looks like @RKSimon fixed that in 651c520. Seems like some clang-based compilers don't surface that warning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants