-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[VPlan] Model FOR extract of exit value in VPlan. #93395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c6067dc
254f9e6
f226ee4
54505a8
fb5997b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -137,6 +137,7 @@ bool VPRecipeBase::mayHaveSideEffects() const { | |
case VPInstruction::Not: | ||
case VPInstruction::CalculateTripCountMinusVF: | ||
case VPInstruction::CanonicalIVIncrementForPart: | ||
case VPInstruction::ExtractFromEnd: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this needed in this patch, given the LiveOut user, or only for the follow-up patch extracting the last element? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is needed for the patch, as it always adds the extract and RAUW outside the loop, relying on DCE to clean up unneeded extracts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, this marks ExtractFromEnd as not having side effects. |
||
case VPInstruction::LogicalAnd: | ||
case VPInstruction::PtrAdd: | ||
return false; | ||
|
@@ -293,13 +294,13 @@ bool VPInstruction::doesGeneratePerAllLanes() const { | |
bool VPInstruction::canGenerateScalarForFirstLane() const { | ||
if (Instruction::isBinaryOp(getOpcode())) | ||
return true; | ||
|
||
if (isVectorToScalar()) | ||
return true; | ||
switch (Opcode) { | ||
case VPInstruction::BranchOnCond: | ||
case VPInstruction::BranchOnCount: | ||
case VPInstruction::CalculateTripCountMinusVF: | ||
case VPInstruction::CanonicalIVIncrementForPart: | ||
case VPInstruction::ComputeReductionResult: | ||
case VPInstruction::PtrAdd: | ||
case VPInstruction::ExplicitVectorLength: | ||
return true; | ||
|
@@ -558,6 +559,29 @@ Value *VPInstruction::generatePerPart(VPTransformState &State, unsigned Part) { | |
|
||
return ReducedPartRdx; | ||
} | ||
case VPInstruction::ExtractFromEnd: { | ||
if (Part != 0) | ||
return State.get(this, 0, /*IsScalar*/ true); | ||
|
||
auto *CI = cast<ConstantInt>(getOperand(1)->getLiveInIRValue()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Document somewhere that 2nd operand of ExtractFromEnd must be a constant, either 1 or 2. This patch exercises Offset = 2 and follow-up #93396 exercises Offset = 1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a comment to where the opcode is defined, thanks. I kept the code general for now, with the constraints that the lane to extract must be <= VF when vectorizing or <= UF otherwise. |
||
unsigned Offset = CI->getZExtValue(); | ||
assert(Offset > 0 && "Offset from end must be positive"); | ||
Value *Res; | ||
if (State.VF.isVector()) { | ||
assert(Offset <= State.VF.getKnownMinValue() && | ||
"invalid offset to extract from"); | ||
// Extract lane VF - Offset from the operand. | ||
Res = State.get( | ||
getOperand(0), | ||
VPIteration(State.UF - 1, VPLane::getLaneFromEnd(State.VF, Offset))); | ||
} else { | ||
assert(Offset <= State.UF && "invalid offset to extract from"); | ||
// When loop is unrolled without vectorizing, retrieve UF - Offset. | ||
Res = State.get(getOperand(0), State.UF - Offset); | ||
} | ||
Res->setName(Name); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This Name was found useless and destined to removal some patches ago: #81411 (review) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is used to retain the original name for the IR values, which still seems useful to keep the IR readable and reduce the test changes (in particular it makes it easier to trace back to where the IR instruction is coming from) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Those are available via Underlying Values? This Name was originally designed to facilitate new names, and as such appear redundant. (Issue independent of this patch.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is for cases where there is no underlying value, which is the case for this opcode and (most?) custom VPInstruction opcodes. |
||
return Res; | ||
} | ||
case VPInstruction::LogicalAnd: { | ||
Value *A = State.get(getOperand(0), Part); | ||
Value *B = State.get(getOperand(1), Part); | ||
|
@@ -575,6 +599,11 @@ Value *VPInstruction::generatePerPart(VPTransformState &State, unsigned Part) { | |
} | ||
} | ||
|
||
bool VPInstruction::isVectorToScalar() const { | ||
return getOpcode() == VPInstruction::ExtractFromEnd || | ||
getOpcode() == VPInstruction::ComputeReductionResult; | ||
} | ||
|
||
#if !defined(NDEBUG) | ||
bool VPInstruction::isFPMathOp() const { | ||
// Inspired by FPMathOperator::classof. Notable differences are that we don't | ||
|
@@ -597,8 +626,7 @@ void VPInstruction::execute(VPTransformState &State) { | |
State.setDebugLocFrom(getDebugLoc()); | ||
bool GeneratesPerFirstLaneOnly = | ||
canGenerateScalarForFirstLane() && | ||
(vputils::onlyFirstLaneUsed(this) || | ||
getOpcode() == VPInstruction::ComputeReductionResult); | ||
(vputils::onlyFirstLaneUsed(this) || isVectorToScalar()); | ||
bool GeneratesPerAllLanes = doesGeneratePerAllLanes(); | ||
for (unsigned Part = 0; Part < State.UF; ++Part) { | ||
if (GeneratesPerAllLanes) { | ||
|
@@ -692,6 +720,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent, | |
case VPInstruction::BranchOnCount: | ||
O << "branch-on-count"; | ||
break; | ||
case VPInstruction::ExtractFromEnd: | ||
O << "extract-from-end"; | ||
break; | ||
case VPInstruction::ComputeReductionResult: | ||
O << "compute-reduction-result"; | ||
break; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -802,7 +802,7 @@ sinkRecurrenceUsersAfterPrevious(VPFirstOrderRecurrencePHIRecipe *FOR, | |
} | ||
|
||
bool VPlanTransforms::adjustFixedOrderRecurrences(VPlan &Plan, | ||
VPBuilder &Builder) { | ||
VPBuilder &LoopBuilder) { | ||
VPDominatorTree VPDT; | ||
VPDT.recalculate(Plan); | ||
|
||
|
@@ -812,6 +812,8 @@ bool VPlanTransforms::adjustFixedOrderRecurrences(VPlan &Plan, | |
if (auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&R)) | ||
RecurrencePhis.push_back(FOR); | ||
|
||
VPBuilder MiddleBuilder( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth renaming There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||
cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSingleSuccessor())); | ||
for (VPFirstOrderRecurrencePHIRecipe *FOR : RecurrencePhis) { | ||
SmallPtrSet<VPFirstOrderRecurrencePHIRecipe *, 4> SeenPhis; | ||
VPRecipeBase *Previous = FOR->getBackedgeValue()->getDefiningRecipe(); | ||
|
@@ -831,18 +833,28 @@ bool VPlanTransforms::adjustFixedOrderRecurrences(VPlan &Plan, | |
// fixed-order recurrence. | ||
VPBasicBlock *InsertBlock = Previous->getParent(); | ||
if (isa<VPHeaderPHIRecipe>(Previous)) | ||
Builder.setInsertPoint(InsertBlock, InsertBlock->getFirstNonPhi()); | ||
LoopBuilder.setInsertPoint(InsertBlock, InsertBlock->getFirstNonPhi()); | ||
else | ||
Builder.setInsertPoint(InsertBlock, std::next(Previous->getIterator())); | ||
LoopBuilder.setInsertPoint(InsertBlock, | ||
std::next(Previous->getIterator())); | ||
|
||
auto *RecurSplice = cast<VPInstruction>( | ||
Builder.createNaryOp(VPInstruction::FirstOrderRecurrenceSplice, | ||
{FOR, FOR->getBackedgeValue()})); | ||
LoopBuilder.createNaryOp(VPInstruction::FirstOrderRecurrenceSplice, | ||
{FOR, FOR->getBackedgeValue()})); | ||
|
||
FOR->replaceAllUsesWith(RecurSplice); | ||
// Set the first operand of RecurSplice to FOR again, after replacing | ||
// all users. | ||
RecurSplice->setOperand(0, FOR); | ||
|
||
Type *IntTy = Plan.getCanonicalIV()->getScalarType(); | ||
auto *Result = cast<VPInstruction>(MiddleBuilder.createNaryOp( | ||
VPInstruction::ExtractFromEnd, | ||
{FOR->getBackedgeValue(), | ||
Plan.getOrAddLiveIn(ConstantInt::get(IntTy, 2))}, | ||
{}, "vector.recur.extract.for.phi")); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, this is the unique Name assigned to this VPInstruction instance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep |
||
RecurSplice->replaceUsesWithIf( | ||
Result, [](VPUser &U, unsigned) { return isa<VPLiveOut>(&U); }); | ||
} | ||
return true; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works "horizontally" for Offsets between 1 and VF, inclusive, if VF >= 2,
and "vertically" for Offsets between 1 and UF, inclusive, if VF = 1, implying UF >= 2.
I.e., this works in general only for Offset = 1 and Offset = 2 across possible {VF, UF} combinations.
May be good to keep private and expose only public
last
andpenultimate
cases. And/or assert.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth supporting any lane w.r.t. to he constraints that it is <= VF in the vector case or <= UF for the scalar case. Added an assert for now. This also avoids explicitly checking the lane and delegating to multiple helper functions.