-
Notifications
You must be signed in to change notification settings - Fork 14k
[VPlan] Explicitly handle scalar pointer inductions. #83068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A VPInstruction only has its first lane used if all users use its first lane only. Use vputils::onlyFirstLaneUsed to continue checking the recipe's users to handle more cases. Besides allowing additional introduction of scalar steps when interleaving in some cases, this also enables using an Add VPInstruction to model the increment.
At the moment, some VPInstructions create only a single scalar value, but use VPTransformatState's 'vector' storage for this value. Those values are effectively uniform-per-VF (or in some cases uniform-across-VF-and-UF). Using the vector/per-part storage doesn't interact well with other recipes, that more accurately using (Part, Lane) to look up scalar values and prevents VPInstructions creating scalars from interacting with other recipes working with scalars. This PR tries to unify handling of scalars by using (Part, 0) for scalar values where only the first lane is demanded. This allows using VPInstructions with other recipes like VPScalarCastRecipe and is also needed when using VPInstructions in more cases otuside the vector loop region to generate scalars. The patch is still a bit rough around the edges, but hopefully serves as start for a discussion how to model more scalar recipes. A potential alternative would be to split off the opcodes that generate scalars only to a separate recipe.
Add a new PtrAdd opcode to VPInstruction that corresponds to IRBuilder::CreatePtrAdd, which creates a GEP with source element type i8. This is then used to model scalarizing VPWidenPointerInductionRecipe by introducing scalar-steps to model the index increment followed by a PtrAdd. Note that PtrAdd needs to be able to generate code for only the first lane or for all lanes. This may warrant introducing a separate recipe for scalarizing that can be created without relying on the underlying IR.
…or-ptr-iv-to-scalar
…or-ptr-iv-to-scalar
✅ With the latest revision this PR passed the C/C++ code formatter. |
ping :) |
continue; | ||
|
||
assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) && | ||
"recipe only generating scalars should have been replaced"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"recipe only generating scalars should have been replaced"); | |
"recipe generating only scalars should have been replaced"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
@@ -1164,11 +1165,14 @@ class VPInstruction : public VPRecipeWithIRFlags { | |||
/// An optional name that can be used for the generated IR instruction. | |||
const std::string Name; | |||
|
|||
bool generatesScalars() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth documenting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
/// Utility method serving execute(): generates a single instance of the | ||
/// modeled instruction. \returns the generated value for \p Part. | ||
/// In some cases an existing value is returned rather than a generated | ||
/// one. | ||
Value *generateInstruction(VPTransformState &State, unsigned Part); | ||
Value *generatePerPart(VPTransformState &State, unsigned Part); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth updating documentation?
generateInstructionPerPart(), generateInstructionPerLane()??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
return | ||
Opcode == VPInstruction::PtrAdd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return | |
Opcode == VPInstruction::PtrAdd; | |
return Opcode == VPInstruction::PtrAdd; |
@@ -273,8 +274,27 @@ VPInstruction::VPInstruction(unsigned Opcode, | |||
assert(isFPMathOp() && "this op can't take fast-math flags"); | |||
} | |||
|
|||
Value *VPInstruction::generateInstruction(VPTransformState &State, | |||
unsigned Part) { | |||
bool VPInstruction::generatesScalars() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: worth clarifying, say, by calling it isGeneratesScalars(), to avoid confusion with the other generate*() methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to doesGenerateScalars
. WDYT?
@@ -509,7 +530,19 @@ void VPInstruction::execute(VPTransformState &State) { | |||
if (hasFastMathFlags()) | |||
State.Builder.setFastMathFlags(getFastMathFlags()); | |||
for (unsigned Part = 0; Part < State.UF; ++Part) { | |||
Value *GeneratedValue = generateInstruction(State, Part); | |||
if (generatesScalars()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be clearer to further distinguish between singular and plural, i.e., generates-a-single-scalar and generates-multiple-scalars? The former case should arguably call State.set(this, P, Part, !IsVector);
where IsVector is set to false, rather than degenerate Lane to iterate over [0,1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated while still calling generatePerLane, thanks!
vputils::onlyFirstLaneUsed(this) ? 1 : State.VF.getKnownMinValue(); | ||
if (getOpcode() == VPInstruction::ComputeReductionResult) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be folded to assign NumLanes once.
Would it help to fold onlyFirstLaneUsed(this) with opcode == ComputeReductionResult, here and below, and if so - how would it be called - onlySingleLaneDefined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks!
/// If all users of VPWidenPointerInductionRecipe only use its scalar values, | ||
/// replace it with a PtrAdd (IndStart, ScalarIVSteps (0, Step)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds more like a mandatory, functional legalization of induction recipes, rather than an optional, performance optimization - referring to the optimizeInductions()
name.
Furthermore, this conceptually introduces two types or stages of the recipes - before and after legalization - which could be represented as two distinct recipes/opcodes, or by recording an indicator whether the recipe was legalized or not. Although these seem unneeded atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is now required (similar to adjusting reductions). Should we move it to a separate transform or possibly clarify the name of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifying the name of the function sounds fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated name and comments
@@ -116,8 +116,7 @@ define void @test_widen(ptr noalias %a, ptr readnone %b) #4 { | |||
; TFA_INTERLEAVE-NEXT: [[TMP18:%.*]] = getelementptr inbounds i64, ptr [[TMP15]], i64 [[TMP17]] | |||
; TFA_INTERLEAVE-NEXT: call void @llvm.masked.store.nxv2i64.p0(<vscale x 2 x i64> [[TMP13]], ptr [[TMP15]], i32 8, <vscale x 2 x i1> [[ACTIVE_LANE_MASK]]) | |||
; TFA_INTERLEAVE-NEXT: call void @llvm.masked.store.nxv2i64.p0(<vscale x 2 x i64> [[TMP14]], ptr [[TMP18]], i32 8, <vscale x 2 x i1> [[ACTIVE_LANE_MASK2]]) | |||
; TFA_INTERLEAVE-NEXT: [[INDEX_NEXT:%.*]] = add i64 [[INDEX]], [[TMP6]] | |||
; TFA_INTERLEAVE-NEXT: [[INDEX_NEXT4]] = add i64 [[INDEX]], [[TMP6]] | |||
; TFA_INTERLEAVE-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP6]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patches associated with this patch mostly move code around, but sometimes helps eliminate more redundancies? Anything worth adding regarding potential 'functional' changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are spurious changes that were left over from some unrelated changes that I moved out of this patch, restored original version of this file.
PtrIV->replaceAllUsesWith(Recipe); | ||
continue; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth moving here some of the documentation above that described what happens next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment, thanks!
ping :) |
/// Returns true if this VPInstruction generates scalar values only. | ||
bool doesGenerateScalars() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add Only
at the end, or at the beginning - onlyScalarsGenerated(), as in VPWidenPointerInductionRecipe above?
It's somewhat confusing, given VPInstructions such as ComputeReductionResult that also generate scalar(s) are excluded. See more below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to doesGeneratePerAllLanes
as suggested below.
@@ -44,6 +44,8 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) { | |||
CachedTypes[OtherV] = ResTy; | |||
return ResTy; | |||
} | |||
case VPInstruction::PtrAdd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: worth asserting and caching the type of the other operand, i.e., join the above cases of ICmp and FOR Splice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PtrAdd's operand have different types, with the first one being a pointer and the second one being an integer offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, of course. Perhaps worth a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
if (!hasResult()) | ||
continue; | ||
assert(GeneratedValue && "generateInstruction must produce a value"); | ||
|
||
bool IsVector = GeneratedValue->getType()->isVectorTy(); | ||
State.set(this, GeneratedValue, Part, !IsVector); | ||
assert((IsVector || getOpcode() == VPInstruction::ComputeReductionResult || | ||
State.VF.isScalar() || vputils::onlyFirstLaneUsed(this)) && | ||
assert((IsVector || OnlyFirstLaneDefined || State.VF.isScalar()) && | ||
"scalar value but not only first lane used"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"scalar value but not only first lane used"); | |
"scalar value but not only first lane defined"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks!
continue; | ||
} | ||
|
||
Value *GeneratedValue = generatePerPart(State, Part); | ||
if (!hasResult()) | ||
continue; | ||
assert(GeneratedValue && "generateInstruction must produce a value"); | ||
|
||
bool IsVector = GeneratedValue->getType()->isVectorTy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusion above cont'd: if (doesGenerateScalars()) above took care of scalars, IsVector here is expected to be true. But for ComputeReductionResult, e.g..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use OnlyGenerateFirstLane
; in those cases a scalar must be returned and set.
vputils::onlyFirstLaneUsed(this) || | ||
getOpcode() == VPInstruction::ComputeReductionResult; | ||
if (doesGenerateScalars()) { | ||
if (OnlyFirstLaneDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps something like the following would be clearer:
// First deal with generating multiple values - per all lanes.
if (doesGeneratePerAllLanes()) {
for (unsigned Lane = 0, NumLanes = State.VF.getKnownMinValue();
Lane != NumLanes; ++Lane) {
Value *P = generatePerLane(State, VPIteration(Part, Lane));
State.set(this, P, VPIteration(Part, Lane));
}
continue;
}
// Now deal with generating a single value - per first lane or per part.
Value *GeneratedValue = DoesGeneratePerFirstLaneOnly ?
generatePerLane(State, VPIteration(Part, 0)) :
generatePerPart(State, Part);
...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated roughly as suggested. also updated to check if only a scalar is needed for the first lane and a scalar can be generated (set to OnlyGenerateFirstLane
).
This allows delegating to generatePerPart
from generatePerLane
, for cases not handled there (all except PtrAdd). This is a consequence of having ocpodes that may generate a vector or scalar per part, sharing codegen for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This delegation may raise confusion, worth an explanation, or a clearer alternative?
Value *VPInstruction::generateInstruction(VPTransformState &State, | ||
unsigned Part) { | ||
bool VPInstruction::doesGenerateScalars() const { | ||
return Opcode == VPInstruction::PtrAdd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing: PtrAdd is (currently) the only VPInstruction that may generate multiple scalars - per lane, or a single scalar - per lane zero only, but there are other VPInstructions that generate a single scalar - per part.
Perhaps doesGeneratePerLane()
would be more accurate, or doesGeneratePerAllLanes()
? See below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use doesGeneratePerAllLanes
.
/// If all users of VPWidenPointerInductionRecipe only use its scalar values, | ||
/// replace it with a PtrAdd (IndStart, ScalarIVSteps (0, Step)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifying the name of the function sounds fine to me.
} | ||
|
||
// Replace widened induction with scalar steps for users that only use | ||
// scalars. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good if these two cases that createScalarIVSteps for scalar users only, would share something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think try to share createScalarIVSteps
for both would make things more complicated, as they have a lot of different arguments. left as is for now.
@@ -1155,6 +1155,10 @@ class VPInstruction : public VPRecipeWithIRFlags { | |||
BranchOnCount, | |||
BranchOnCond, | |||
ComputeReductionResult, | |||
// Add an offset in bytes (second operand) to a base pointer (first | |||
// operand). Only generates scalar valuse (either for the first lane only or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
for (unsigned Part = 0; Part < State.UF; ++Part) { | ||
Value *GeneratedValue = generateInstruction(State, Part); | ||
if (doesGeneratePerAllLanes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even though it's not expensive check, but is worth to hoist it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved, thanks!
/// If any user of a VPWidenIntOrFpInductionRecipe needs scalar values, | ||
/// provide them by building scalar steps off of the canonical scalar IV and | ||
/// Legalize VPWidenPointer induction, by replacing it with a PtrAdd (IndStart, | ||
/// ScalarIVSteps (0, Step)) if only its scalar values are used. Als optimize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
HeaderVPBB->insert(Steps, IP); | ||
return Steps; | ||
} | ||
|
||
/// If any user of a VPWidenIntOrFpInductionRecipe needs scalar values, | ||
/// provide them by building scalar steps off of the canonical scalar IV and | ||
/// Legalize VPWidenPointer induction, by replacing it with a PtrAdd (IndStart, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate "legalization" part ? Is it because VPWidenPointerInductionRecipe
with onlyScalarsGenerated == true
cannot be handled by generation part ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, VPWidenPointerInductionRecipe
can only generate vector values after this change.
I put up #85688 to add some more initial info about this kind of gradual lowering to the docs. What's still missing is a proper name/terminology for recipes that cannot be code-gen'd like predicated VPReplicateRecipe before introducing replicate regions and VPWidenPointerInductionRecipe
after this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to current main
and addressed comments, thanks!
@@ -1155,6 +1155,10 @@ class VPInstruction : public VPRecipeWithIRFlags { | |||
BranchOnCount, | |||
BranchOnCond, | |||
ComputeReductionResult, | |||
// Add an offset in bytes (second operand) to a base pointer (first | |||
// operand). Only generates scalar valuse (either for the first lane only or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
for (unsigned Part = 0; Part < State.UF; ++Part) { | ||
Value *GeneratedValue = generateInstruction(State, Part); | ||
if (doesGeneratePerAllLanes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved, thanks!
/// If any user of a VPWidenIntOrFpInductionRecipe needs scalar values, | ||
/// provide them by building scalar steps off of the canonical scalar IV and | ||
/// Legalize VPWidenPointer induction, by replacing it with a PtrAdd (IndStart, | ||
/// ScalarIVSteps (0, Step)) if only its scalar values are used. Als optimize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
HeaderVPBB->insert(Steps, IP); | ||
return Steps; | ||
} | ||
|
||
/// If any user of a VPWidenIntOrFpInductionRecipe needs scalar values, | ||
/// provide them by building scalar steps off of the canonical scalar IV and | ||
/// Legalize VPWidenPointer induction, by replacing it with a PtrAdd (IndStart, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, VPWidenPointerInductionRecipe
can only generate vector values after this change.
I put up #85688 to add some more initial info about this kind of gradual lowering to the docs. What's still missing is a proper name/terminology for recipes that cannot be code-gen'd like predicated VPReplicateRecipe before introducing replicate regions and VPWidenPointerInductionRecipe
after this change
@@ -44,6 +44,8 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) { | |||
CachedTypes[OtherV] = ResTy; | |||
return ResTy; | |||
} | |||
case VPInstruction::PtrAdd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, of course. Perhaps worth a comment.
/// If any user of a VPWidenIntOrFpInductionRecipe needs scalar values, | ||
/// provide them by building scalar steps off of the canonical scalar IV and | ||
/// Legalize VPWidenPointer induction, by replacing it with a PtrAdd (IndStart, | ||
/// ScalarIVSteps (0, Step)) if only its scalar values are used. Als optimize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// ScalarIVSteps (0, Step)) if only its scalar values are used. Als optimize | |
/// ScalarIVSteps (0, Step)) if only its scalar values are used. Also optimize |
/// Legalize VPWidenPointer induction, by replacing it with a PtrAdd (IndStart, | ||
/// ScalarIVSteps (0, Step)) if only its scalar values are used. Als optimize | ||
/// VPWidenIntOrFpInductionRecipe, if any of its users needs scalar values, by | ||
/// providing them by building scalar steps off of the canonical scalar IV and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// providing them by building scalar steps off of the canonical scalar IV and | |
/// providing them scalar steps built on the canonical scalar IV and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
/// provide them by building scalar steps off of the canonical scalar IV and | ||
/// Legalize VPWidenPointerInductionRecipe, by replacing it with a PtrAdd | ||
/// (IndStart, ScalarIVSteps (0, Step)) if only its scalar values are used, as | ||
/// VPWidenPointerInductionRecipe cannot generate scalars. Also optimize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// VPWidenPointerInductionRecipe cannot generate scalars. Also optimize | |
/// VPWidenPointerInductionRecipe will generate vectors only. Also optimize |
So if scalar are used along with vectors of widen pointer induction, the former will be extracted from the latter; as opposed to widening int or fp inductions, where scalar users are fed scalar steps even if there are also vector users. Right? This preserves current behavior, but worth noting the discrepancy, potentially removing it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried to clarify, thanks!
@@ -511,17 +550,33 @@ void VPInstruction::execute(VPTransformState &State) { | |||
"Recipe not a FPMathOp but has fast-math flags?"); | |||
if (hasFastMathFlags()) | |||
State.Builder.setFastMathFlags(getFastMathFlags()); | |||
State.Builder.SetCurrentDebugLocation(getDebugLoc()); | |||
bool OnlyGenerateFirstLane = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool OnlyGenerateFirstLane = | |
bool GeneratesPerFirstLaneOnly = |
to be more consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
assert((IsVector || getOpcode() == VPInstruction::ComputeReductionResult || | ||
State.VF.isScalar() || vputils::onlyFirstLaneUsed(this)) && | ||
"scalar value but not only first lane used"); | ||
State.set(this, GeneratedValue, Part, /*IsScalar*/ OnlyGenerateFirstLane); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: better to set State after the following assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Order flipped, thanks!
return P; | ||
} | ||
default: { | ||
Value *Res = generatePerPart(State, Lane.Part); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this raises an eyebrow, or two.
When/is it hard for the caller to accurately call for generatePerLane, rather than generatePerPart? If the former is caller for all lanes, this may end up generating VF copies per same part. Could some assert guard against this from happening, e.g., Lane must be zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a consequence of generatePerLane
being responsible to generate all lanes and/or the first lane, while generatePerPart
can either generate the vector or scalar per-part, but would need to also support PtrAdd. It seems clearer to flip around the delegation and have generatePerPart call generatePerLane for PtrAdd. Updated.
Value *Cmp = | ||
Builder.CreateICmp(CmpInst::Predicate::ICMP_UGT, ScalarTC, Step); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still visible, missing a commit?
} | ||
default: { | ||
Value *Res = generatePerPart(State, Lane.Part); | ||
assert(!hasResult() || !Res->getType()->isVectorTy()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing error message.
This assert is taken care of by the caller of generatePerPart(); could this be done consistently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert is gone now.
vputils::onlyFirstLaneUsed(this) || | ||
getOpcode() == VPInstruction::ComputeReductionResult; | ||
if (doesGenerateScalars()) { | ||
if (OnlyFirstLaneDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This delegation may raise confusion, worth an explanation, or a clearer alternative?
✅ With the latest revision this PR passed the Python code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thanks! Adding last minor comments and suggestions.
case VPInstruction::PtrAdd: | ||
assert(vputils::onlyFirstLaneUsed(this) && | ||
"can only generate first lane for PtrAdd"); | ||
return generatePerLane(State, VPIteration(Part, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be more consistent:
Value *Ptr = State.get(getOperand(0), Part, /* IsScalar */ true);
Value *Addend = State.get(getOperand(1), Part, /* IsScalar */ true);
return Builder.CreatePtrAdd(Ptr, Addend, Name);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjusted, thanks!
const VPIteration &Lane) { | ||
IRBuilderBase &Builder = State.Builder; | ||
|
||
assert(getOpcode() == VPInstruction::PtrAdd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, thanks!
bool GeneratesPerFirstLaneOnly = | ||
canGenerateScalarForFirstLane() && | ||
(vputils::onlyFirstLaneUsed(this) || | ||
getOpcode() == VPInstruction::ComputeReductionResult); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Unrelated to this patch, while we're here: why is ComputeReductionResult an exception?)
if (GeneratesPerAllLanes) { | ||
for (unsigned Lane = 0, NumLanes = State.VF.getKnownMinValue(); | ||
Lane != NumLanes; ++Lane) { | ||
Value *P = generatePerLane(State, VPIteration(Part, Lane)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can assert P here, consistent with asserts of GeneratedValue below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
continue; | ||
} | ||
|
||
Value *GeneratedValue = generatePerPart(State, Part); | ||
if (!hasResult()) | ||
continue; | ||
assert(GeneratedValue && "generateInstruction must produce a value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generatePerPart rather than generateInstruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
/// Utility method serving execute(): generates a single instance of the | ||
/// modeled instruction. \returns the generated value for \p Part. | ||
/// In some cases an existing value is returned rather than a generated | ||
/// Returns true if this VPInstruction generates scalar values for all lanes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some context? E.g.,
// Most VPInstructions generate a single value per part, either vector or scalar. VPReplicateRecipe takes care of generating multiple (scalar) values per all lanes, stemming from an original ingredient. This method identifies the (rare) cases of VPInstructions that do so as well, w/o an underlying ingredient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
Following up on llvm#83068, when scalarizing VPWidenPointerInductionRecipe, The `onlyScalarsGenerated` checks whether VF is scalable. With scalable VF, it requires all user to use the first lane only. However if any user happens to be VPLiveOut, the check inevitably fails. This patch addresses this by implementing onlyFirstLaneUsed for the VPLiveOut class. It ensures that if the operand is a VPWidenPointerInductionRecipe, it returns true.
Following up on llvm#83068, when scalarizing VPWidenPointerInductionRecipe, The `onlyScalarsGenerated` checks whether VF is scalable. With scalable VF, it requires all user to use the first lane only. However if any user happens to be VPLiveOut, the check inevitably fails. This patch addresses this by implementing onlyFirstLaneUsed for the VPLiveOut class. It ensures that if the operand is a VPWidenPointerInductionRecipe, it returns true.
Add a new PtrAdd opcode to VPInstruction that corresponds to
IRBuilder::CreatePtrAdd, which creates a GEP with source element type
i8.
This is then used to model scalarizing VPWidenPointerInductionRecipe by
introducing scalar-steps to model the index increment followed by a
PtrAdd.
Note that PtrAdd needs to be able to generate code for only the first
lane or for all lanes. This may warrant introducing a separate recipe
for scalarizing that can be created without relying on the underlying
IR.
Depends on #80271