-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[VPlan] Factor out logic to common compute costs to helper (NFCI). #153361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesA number of recipes compute costs for the same opcodes for scalars or vectors, depending on the recipe. Move the common logic out to a helper in VPRecipeWithIRFlags, that is then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction. This makes it easier to cover all relevant opcodes, without duplication. Full diff: https://github.com/llvm/llvm-project/pull/153361.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 11a7d8b339ae9..1d89d3ade648d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -898,6 +898,11 @@ struct VPRecipeWithIRFlags : public VPSingleDefRecipe, public VPIRFlags {
}
void execute(VPTransformState &State) override = 0;
+
+ /// Compute the cost for this recipe using \p Opcode. If \p IsVector is true, compute the vector cost, otherwise compute the scalar cost for a single instance.
+std::optional<InstructionCost>
+getCommonCost(unsigned Opcode, ElementCount VF,
+ VPCostContext &Ctx, bool IsVector)const;
};
/// Helper to access the operand that contains the unroll part for this recipe
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 23c10d2b25263..0bd094ecf7c99 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -940,28 +940,90 @@ Value *VPInstruction::generate(VPTransformState &State) {
}
}
+std::optional<InstructionCost>
+VPRecipeWithIRFlags::getCommonCost(unsigned Opcode, ElementCount VF,
+ VPCostContext &Ctx, bool IsVector) const {
+ Type *ScalarTy = Ctx.Types.inferScalarType(this);
+ Type *ResultTy = IsVector ? toVectorTy(ScalarTy, VF) : ScalarTy;
+ switch (Opcode) {
+ case Instruction::FNeg:
+ return Ctx.TTI.getArithmeticInstrCost(
+ Opcode, ResultTy, Ctx.CostKind,
+ {TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
+ {TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None});
+ case Instruction::UDiv:
+ case Instruction::SDiv:
+ case Instruction::SRem:
+ case Instruction::URem:
+ case Instruction::Add:
+ case Instruction::FAdd:
+ case Instruction::Sub:
+ case Instruction::FSub:
+ case Instruction::Mul:
+ case Instruction::FMul:
+ case Instruction::FDiv:
+ case Instruction::FRem:
+ case Instruction::Shl:
+ case Instruction::LShr:
+ case Instruction::AShr:
+ case Instruction::And:
+ case Instruction::Or:
+ case Instruction::Xor: {
+ VPValue *RHS = getOperand(1);
+ // Certain instructions can be cheaper to vectorize if they have a constant
+ // second vector operand. One example of this are shifts on x86.
+ TargetTransformInfo::OperandValueInfo RHSInfo = Ctx.getOperandInfo(RHS);
+
+ if (RHSInfo.Kind == TargetTransformInfo::OK_AnyValue &&
+ getOperand(1)->isDefinedOutsideLoopRegions())
+ RHSInfo.Kind = TargetTransformInfo::OK_UniformValue;
+ Instruction *CtxI =
+ dyn_cast_or_null<Instruction>(getUnderlyingValue());
+
+ SmallVector<const Value *, 4> Operands;
+ if (CtxI)
+ Operands.append(CtxI->value_op_begin(), CtxI->value_op_end());
+ return Ctx.TTI.getArithmeticInstrCost(
+ Opcode, ResultTy, Ctx.CostKind,
+ {TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
+ RHSInfo, Operands, CtxI, &Ctx.TLI);
+ }
+ case Instruction::Freeze:
+ // This opcode is unknown. Assume that it is the same as 'mul'.
+ return Ctx.TTI.getArithmeticInstrCost(Instruction::Mul, ResultTy,
+ Ctx.CostKind);
+ case Instruction::ExtractValue:
+ return Ctx.TTI.getInsertExtractValueCost(Instruction::ExtractValue,
+ Ctx.CostKind);
+ case Instruction::ICmp:
+ case Instruction::FCmp: {
+ Type *ScalarOpTy = Ctx.Types.inferScalarType(getOperand(0));
+ Type *OpTy = IsVector ? toVectorTy(ScalarOpTy, VF) : ScalarOpTy;
+ Instruction *CtxI =
+ dyn_cast_or_null<Instruction>(getUnderlyingValue());
+ return Ctx.TTI.getCmpSelInstrCost(
+ Opcode, OpTy, CmpInst::makeCmpResultType(OpTy), getPredicate(),
+ Ctx.CostKind, {TTI::OK_AnyValue, TTI::OP_None},
+ {TTI::OK_AnyValue, TTI::OP_None}, CtxI);
+ }
+ }
+ return std::nullopt;
+}
+
InstructionCost VPInstruction::computeCost(ElementCount VF,
VPCostContext &Ctx) const {
if (Instruction::isBinaryOp(getOpcode())) {
- Type *ResTy = Ctx.Types.inferScalarType(this);
- if (!vputils::onlyFirstLaneUsed(this))
- ResTy = toVectorTy(ResTy, VF);
-
- if (!getUnderlyingValue()) {
- switch (getOpcode()) {
- case Instruction::FMul:
- return Ctx.TTI.getArithmeticInstrCost(getOpcode(), ResTy, Ctx.CostKind);
- default:
- // TODO: Compute cost for VPInstructions without underlying values once
- // the legacy cost model has been retired.
- return 0;
- }
+ if (!getUnderlyingValue() && getOpcode() != Instruction::FMul) {
+ // TODO: Compute cost for VPInstructions without underlying values once
+ // the legacy cost model has been retired.
+ return 0;
}
assert(!doesGeneratePerAllLanes() &&
"Should only generate a vector value or single scalar, not scalars "
"for all lanes.");
- return Ctx.TTI.getArithmeticInstrCost(getOpcode(), ResTy, Ctx.CostKind);
+ return *getCommonCost(getOpcode(), VF, Ctx,
+ /*IsVector=*/!vputils::onlyFirstLaneUsed(this));
}
switch (getOpcode()) {
@@ -2025,20 +2087,13 @@ void VPWidenRecipe::execute(VPTransformState &State) {
InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
VPCostContext &Ctx) const {
switch (Opcode) {
- case Instruction::FNeg: {
- Type *VectorTy = toVectorTy(Ctx.Types.inferScalarType(this), VF);
- return Ctx.TTI.getArithmeticInstrCost(
- Opcode, VectorTy, Ctx.CostKind,
- {TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
- {TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None});
- }
-
case Instruction::UDiv:
case Instruction::SDiv:
case Instruction::SRem:
case Instruction::URem:
// More complex computation, let the legacy cost-model handle this for now.
return Ctx.getLegacyCost(cast<Instruction>(getUnderlyingValue()), VF);
+ case Instruction::FNeg:
case Instruction::Add:
case Instruction::FAdd:
case Instruction::Sub:
@@ -2052,45 +2107,12 @@ InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
case Instruction::AShr:
case Instruction::And:
case Instruction::Or:
- case Instruction::Xor: {
- VPValue *RHS = getOperand(1);
- // Certain instructions can be cheaper to vectorize if they have a constant
- // second vector operand. One example of this are shifts on x86.
- TargetTransformInfo::OperandValueInfo RHSInfo = Ctx.getOperandInfo(RHS);
-
- if (RHSInfo.Kind == TargetTransformInfo::OK_AnyValue &&
- getOperand(1)->isDefinedOutsideLoopRegions())
- RHSInfo.Kind = TargetTransformInfo::OK_UniformValue;
- Type *VectorTy = toVectorTy(Ctx.Types.inferScalarType(this), VF);
- Instruction *CtxI = dyn_cast_or_null<Instruction>(getUnderlyingValue());
-
- SmallVector<const Value *, 4> Operands;
- if (CtxI)
- Operands.append(CtxI->value_op_begin(), CtxI->value_op_end());
- return Ctx.TTI.getArithmeticInstrCost(
- Opcode, VectorTy, Ctx.CostKind,
- {TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
- RHSInfo, Operands, CtxI, &Ctx.TLI);
- }
- case Instruction::Freeze: {
- // This opcode is unknown. Assume that it is the same as 'mul'.
- Type *VectorTy = toVectorTy(Ctx.Types.inferScalarType(this), VF);
- return Ctx.TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy,
- Ctx.CostKind);
- }
- case Instruction::ExtractValue: {
- return Ctx.TTI.getInsertExtractValueCost(Instruction::ExtractValue,
- Ctx.CostKind);
- }
+ case Instruction::Xor:
+ case Instruction::Freeze:
+ case Instruction::ExtractValue:
case Instruction::ICmp:
- case Instruction::FCmp: {
- Instruction *CtxI = dyn_cast_or_null<Instruction>(getUnderlyingValue());
- Type *VectorTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF);
- return Ctx.TTI.getCmpSelInstrCost(
- Opcode, VectorTy, CmpInst::makeCmpResultType(VectorTy), getPredicate(),
- Ctx.CostKind, {TTI::OK_AnyValue, TTI::OP_None},
- {TTI::OK_AnyValue, TTI::OP_None}, CtxI);
- }
+ case Instruction::FCmp:
+ return *getCommonCost(getOpcode(), VF, Ctx, /*IsVector=*/true);
default:
llvm_unreachable("Unsupported opcode for instruction");
}
@@ -2964,8 +2986,6 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
// transform, avoid computing their cost multiple times for now.
Ctx.SkipCostComputation.insert(UI);
- TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
- Type *ResultTy = Ctx.Types.inferScalarType(this);
switch (UI->getOpcode()) {
case Instruction::GetElementPtr:
// We mark this instruction as zero-cost because the cost of GEPs in
@@ -2987,12 +3007,7 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
case Instruction::And:
case Instruction::Or:
case Instruction::Xor: {
- auto Op2Info = Ctx.getOperandInfo(getOperand(1));
- SmallVector<const Value *, 4> Operands(UI->operand_values());
- return Ctx.TTI.getArithmeticInstrCost(
- UI->getOpcode(), ResultTy, CostKind,
- {TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
- Op2Info, Operands, UI, &Ctx.TLI) *
+ return *getCommonCost(getOpcode(), VF, Ctx, /*IsVector=*/false) *
(isSingleScalar() ? 1 : VF.getFixedValue());
}
}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
A number of recipes compute costs for the same opcodes for scalars or vectors, depending on the recipe. Move the common logic out to a helper in VPRecipeWithIRFlags, that is then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction. This makes it easier to cover all relevant opcodes, without duplication.
0a029ea
to
59d2dbf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Type *OpTy = IsVector ? toVectorTy(ScalarOpTy, VF) : ScalarOpTy; | ||
Instruction *CtxI = dyn_cast_or_null<Instruction>(getUnderlyingValue()); | ||
return Ctx.TTI.getCmpSelInstrCost( | ||
Opcode, OpTy, CmpInst::makeCmpResultType(OpTy), getPredicate(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we assert that result type matches the actual result type in vplan? I don't know if there is anything in vplan that fundamentally restricts icmp/fcmp to be consistent with CmpInst::makeCmpResultType(OpTy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the result type will be a vector of i1 if the operand type is a vector and otherwise a scalar i1. The number of lanes of the operand type must match the number of lanes of the result type, so CmpInst::makeCmpResultType(OpTy)
should always do the right thing. And the scalar type of FCmp/ICmp must also be i1.
@@ -898,6 +898,13 @@ struct VPRecipeWithIRFlags : public VPSingleDefRecipe, public VPIRFlags { | |||
} | |||
|
|||
void execute(VPTransformState &State) override = 0; | |||
|
|||
/// Compute the cost for this recipe using \p Opcode. If \p IsVector is true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Perhaps better to also mention the VF and CostKind, and also explain the apparent contradiction between \p IsVector and \p VF in some cases, i.e. something like
/// Compute the cost for this recipe using \p Opcode, the vector's
/// ElementCount \p VF and the cost kind specified by \p Ctx.CostKind.
/// If \p IsVector is true, ...
/// NOTE: \p IsVector may contradict \p VF in cases where only the
/// first lane of the vector is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the function to just take an VF argument, and set it to 1 if we compute the scalar cost.
VPRecipeWithIRFlags::getCommonCost(unsigned Opcode, ElementCount VF, | ||
VPCostContext &Ctx, bool IsVector) const { | ||
Type *ScalarTy = Ctx.Types.inferScalarType(this); | ||
Type *ResultTy = IsVector ? toVectorTy(ScalarTy, VF) : ScalarTy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth adding a comment explaining the discrepancy between IsVector
and VF as it could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsVector is gone now, thanks
return Ctx.TTI.getArithmeticInstrCost( | ||
Opcode, ResultTy, Ctx.CostKind, | ||
{TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None}, | ||
{TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just do
return Ctx.TTI.getArithmeticInstrCost(Opcode, ResultTy, Ctx.CostKind);
given that it already has the default parameters that you want?
LLVM_ABI InstructionCost getArithmeticInstrCost(
unsigned Opcode, Type *Ty,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
TTI::OperandValueInfo Opd1Info = {TTI::OK_AnyValue, TTI::OP_None},
TTI::OperandValueInfo Opd2Info = {TTI::OK_AnyValue, TTI::OP_None},
ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
const TargetLibraryInfo *TLibInfo = nullptr) const;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated, thanks
// second vector operand. One example of this are shifts on x86. | ||
TargetTransformInfo::OperandValueInfo RHSInfo = Ctx.getOperandInfo(RHS); | ||
|
||
if (RHSInfo.Kind == TargetTransformInfo::OK_AnyValue && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code here deviates from the code that was previously executed in VPReplicateRecipe::computeCost so I don't think this PR is NFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK_UniformValue
is only relevant for vector versions, so the we currently compute exactly the same costs. But it is not necessary to preform the check for scalar VFs, so I made it conditional.
… (NFCI). (#153361) A number of recipes compute costs for the same opcodes for scalars or vectors, depending on the recipe. Move the common logic out to a helper in VPRecipeWithIRFlags, that is then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction. This makes it easier to cover all relevant opcodes, without duplication. PR: llvm/llvm-project#153361
A number of recipes compute costs for the same opcodes for scalars or vectors, depending on the recipe.
Move the common logic out to a helper in VPRecipeWithIRFlags, that is then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction.
This makes it easier to cover all relevant opcodes, without duplication.