Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) #114292

Merged
merged 42 commits into from
Dec 12, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
1b89761
[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC)
fhahn Oct 11, 2024
4e5c743
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Nov 7, 2024
b87cf14
!fixup address comments, thanks!
fhahn Nov 7, 2024
4ddc87a
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks-tmp
fhahn Nov 7, 2024
599e690
!fixup address latest comments, thanks!
fhahn Nov 7, 2024
1a77b55
[VPlan] Add PredIdx and SuccIdx arguments to connectBlocks (NFC).
fhahn Nov 9, 2024
a2137b4
[VPlan] Add insertOnEdge (NFC).
fhahn Nov 9, 2024
b4d0eac
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Nov 9, 2024
2b4c71f
Merge branch 'main' into vplan-runtime-checks
fhahn Nov 9, 2024
be2c3a6
!fixup use insertOnEdge.
fhahn Nov 9, 2024
466b393
[VPlan] Add insertOnEdge to VPBlockUtils (NFC).
fhahn Nov 9, 2024
10a675e
Merge branch 'main' into vplan-runtime-checks
fhahn Nov 9, 2024
7996451
!fixup cleanup after merge.
fhahn Nov 9, 2024
5ea6f7b
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Nov 9, 2024
64909aa
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Nov 10, 2024
382380a
!fixup formatting
fhahn Nov 10, 2024
333536a
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Nov 25, 2024
e5b8af3
!fixup address latest comments, thanks!
fhahn Nov 25, 2024
df6894a
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Nov 25, 2024
927a66d
!fixup update verifier
fhahn Nov 25, 2024
5468f61
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Dec 4, 2024
3eef601
!fixup address comments, thanks!
fhahn Dec 4, 2024
1d6db4a
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Dec 5, 2024
22eeebe
!fixup address latest comments, thanks!
fhahn Dec 5, 2024
886bcc2
!fixup restore part of assert and fix formatting
fhahn Dec 5, 2024
65ac2d7
!fixup more formatting fixes
fhahn Dec 5, 2024
2f7d530
[VPlan] Use RPOT for VPlan codegen and printing.
fhahn Dec 6, 2024
ba08c2e
Merge branch 'main' into vplan-runtime-checks
fhahn Dec 6, 2024
4c76bec
!fixup update test
fhahn Dec 6, 2024
43c9186
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Dec 6, 2024
7f08758
!fixup update test
fhahn Dec 6, 2024
3709f17
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Dec 6, 2024
a72df24
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Dec 7, 2024
16a6246
!Fixup update after merge
fhahn Dec 7, 2024
87f2815
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Dec 7, 2024
b7b43a8
!fixup fix formatting
fhahn Dec 7, 2024
906603f
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Dec 11, 2024
1e7cac7
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Dec 11, 2024
4a51d29
!fixup use first successor for middle block.
fhahn Dec 12, 2024
f53cf1b
Merge remote-tracking branch 'origin/main' into vplan-runtime-checks
fhahn Dec 12, 2024
a0af583
!fixup update after merging #112138
fhahn Dec 12, 2024
968598b
!fixup use front instead of [0]
fhahn Dec 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 56 additions & 28 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2429,6 +2429,30 @@ InnerLoopVectorizer::getOrCreateVectorTripCount(BasicBlock *InsertBlock) {
return VectorTripCount;
}

/// Helper to connect both the vector and scalar preheaders to the Plan's
/// entry. This is used when adjusting \p Plan during skeleton
/// creation, i.e. adjusting the plan after introducing an initial runtime
/// check.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Helper to connect both the vector and scalar preheaders to the Plan's
/// entry. This is used when adjusting \p Plan during skeleton
/// creation, i.e. adjusting the plan after introducing an initial runtime
/// check.
/// Helper to connect the scalar preheader directly to the Plan's entry, as a bypass of the vector loop. This adjusts \p Plan during skeleton creation after introducing an initial runtime check at Entry's end. The scalar preheader is initially connected to the middle block only, as a remainder of the vector loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

static void connectScalarPreheaderInVPlan(VPlan &Plan) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placed here rather than in VPlan.h/VPBlockUtils expecting it to be a temporary local scaffold? Worth documenting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, also added a comment, thanks!

VPBlockBase *VectorPH = Plan.getVectorPreheader();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPBlockBase *VectorPH = Plan.getVectorPreheader();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped, thanks!

VPBlockBase *ScalarPH = Plan.getScalarPreheader();
VPBlockBase *PredVPB = Plan.getEntry();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPBlockBase *PredVPB = Plan.getEntry();
VPBlockBase *Entry = Plan.getEntry();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

VPBlockUtils::connectBlocks(PredVPB, ScalarPH, -1, 0);
VPBlockUtils::connectBlocks(PredVPB, VectorPH, 0, -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPlan starts with createInitialVPlan() producing a disconnected Entry and a predecessor-less vector_PH --> vector_region (with Header --> Latch) --> middle_block --> scalar_PH, plus an optional middle-block --> exit.

Would it be better to connect Entry-->vector_PH at the outset (ah, disregard - this indeed takes place below...), and here connect Entry-->scalar_PH followed by swapSuccessors(Entry)? The use of non-negative indices in connectBlocks() may be confusing, as it not only connects the two blocks but also disconnects them from existing successor and/or predecessor. I.e., replacePredecessor() or replaceSuccessor() may be more appropriate.

Furthermore, TCCheckBlock could then be inserted on this Entry-->vector_PH edge, so that connectScalarPreheaderInVPlan() assimilates introduceCheckBlockInVPlan().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to connect + swap, thanks.

Furthermore, TCCheckBlock could then be inserted on this Entry-->vector_PH edge, so that connectScalarPreheaderInVPlan() assimilates introduceCheckBlockInVPlan().

Not sure how exactly, left as is for now.

}

/// Introduces a new VPIRBasicBlock for \p CheckIRBB to \p Plan between the
/// vector preheader and its predecessor, also connecting to the scalar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// vector preheader and its predecessor, also connecting to the scalar
/// vector preheader and its predecessor, also connecting the new block to the scalar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

/// preheader.
static void introduceCheckBlockInVPlan(VPlan &Plan, BasicBlock *CheckIRBB) {
VPBlockBase *ScalarPH = Plan.getScalarPreheader();
VPBlockBase *VectorPH = Plan.getVectorPreheader();
VPBlockBase *PreVectorPH = VectorPH->getSinglePredecessor();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScalarPH is expected to be the other successor of PreVectorPH, so could be asserted (or another way to retrieve ScalarPH), although more general w/o this assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an assert for now, than

VPIRBasicBlock *CheckVPIRBB = VPIRBasicBlock::fromBasicBlock(CheckIRBB);
VPBlockUtils::connectBlocks(CheckVPIRBB, ScalarPH);
VPBlockUtils::insertOnEdge(PreVectorPH, VectorPH, CheckVPIRBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPBlockUtils::connectBlocks(CheckVPIRBB, ScalarPH);
VPBlockUtils::insertOnEdge(PreVectorPH, VectorPH, CheckVPIRBB);
// Connect ScalarPH before VectorPH (via insert on edge), so that ScalarPH is the first successor of the new check block.
VPBlockUtils::connectBlocks(CheckVPIRBB, ScalarPH);
VPBlockUtils::insertOnEdge(PreVectorPH, VectorPH, CheckVPIRBB);

or first insert on edge, so that check block has vector loop preheader as its only successor, similar to the initial Entry block, and then connect it also to scalar loop preheader, followed by swapping the successors, aligning with the above.

}

void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent): LoopScalarPreHeader is passed as parameter Bypass, while LoopVectorPreHeader is retrieved directly to set TCCheckBlock (before being reset). Would be better to be consistent.

Value *Count = getTripCount();
// Reuse existing vector loop preheader for TC checks.
Expand Down Expand Up @@ -2513,14 +2537,15 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
DT->getNode(Bypass)->getIDom()) &&
"TC check is expected to dominate Bypass");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert was probably added along with the lines below, but should remain, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is to check the DT updates from SplitBlock I think


// Update dominator for Bypass & LoopExit (if needed).
DT->changeImmediateDominator(Bypass, TCCheckBlock);
BranchInst &BI =
*BranchInst::Create(Bypass, LoopVectorPreHeader, CheckMinIters);
if (hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator()))
setBranchWeights(BI, MinItersBypassWeights, /*IsExpected=*/false);
ReplaceInstWithInst(TCCheckBlock->getTerminator(), &BI);
LoopBypassBlocks.push_back(TCCheckBlock);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar preheader can be connected in VPlan from the outset, rather than here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places that assume the scalar PH has a single predecessor, which would need to be updated. Could pull this in here or adjust as follow-up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single predecessor of Scalar PH in VPlan being middle_block, right?
I.e., scalar loop is initially connected as leftover/remainder loop only, and here connected also as alternative/bypass loop - to handle trip counts too small for the vector loop, and potentially other unvectorized cases.
Perhaps connectScalarAsBypassLoopInVPlan() is more accurate than connectScalarPreheaderInVPlan(), given that scalar PH is already connected in VPlan?

Applying this additional connection earlier, is fine as a later follow-up, perhaps w/ a TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, renamed and added TODO to connectScalarAsBypassLoopInVPlan

// TODO: Wrap LoopVectorPreHeader in VPIRBasicBlock here.
connectScalarPreheaderInVPlan(Plan);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth wrapping LoopVectorPreHeader in a VPIRBB via connectCheckBlockInVPlan() as well, perhaps as a TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO for now, thanks!

}

BasicBlock *InnerLoopVectorizer::emitSCEVChecks(BasicBlock *Bypass) {
Expand All @@ -2537,6 +2562,8 @@ BasicBlock *InnerLoopVectorizer::emitSCEVChecks(BasicBlock *Bypass) {
"Should already be a bypass block due to iteration count check");
LoopBypassBlocks.push_back(SCEVCheckBlock);
AddedSafetyChecks = true;

introduceCheckBlockInVPlan(Plan, SCEVCheckBlock);
return SCEVCheckBlock;
}

Expand Down Expand Up @@ -2573,6 +2600,7 @@ BasicBlock *InnerLoopVectorizer::emitMemRuntimeChecks(BasicBlock *Bypass) {

AddedSafetyChecks = true;

introduceCheckBlockInVPlan(Plan, MemCheckBlock);
return MemCheckBlock;
}

Expand Down Expand Up @@ -7643,20 +7671,15 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
OrigLoop->getHeader()->getContext());
VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);

LLVM_DEBUG(dbgs() << "Executing best plan with VF=" << BestVF
<< ", UF=" << BestUF << '\n');
BestVPlan.setName("Final VPlan");
LLVM_DEBUG(BestVPlan.dump());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved later to VPlan::execute(), after is replaces VPBBtoVPIRBB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, can also split off once we are happy

// Perform the actual loop transformation.
VPTransformState State(BestVF, BestUF, LI, DT, ILV.Builder, &ILV, &BestVPlan);

// 0. Generate SCEV-dependent code into the preheader, including TripCount,
// before making any changes to the CFG.
if (!BestVPlan.getPreheader()->empty()) {
if (!BestVPlan.getEntry()->empty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: worth renaming getPreheader() <--> getEntry() separately (?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undone for now, thanks

State.CFG.PrevBB = OrigLoop->getLoopPreheader();
State.Builder.SetInsertPoint(OrigLoop->getLoopPreheader()->getTerminator());
BestVPlan.getPreheader()->execute(&State);
BestVPlan.getEntry()->execute(&State);
}
if (!ILV.getTripCount())
ILV.setTripCount(State.get(BestVPlan.getTripCount(), VPLane(0)));
Expand Down Expand Up @@ -7864,8 +7887,6 @@ EpilogueVectorizerMainLoop::emitIterationCountCheck(BasicBlock *Bypass,
DT->getNode(Bypass)->getIDom()) &&
"TC check is expected to dominate Bypass");

// Update dominator for Bypass.
DT->changeImmediateDominator(Bypass, TCCheckBlock);
LoopBypassBlocks.push_back(TCCheckBlock);

// Save the trip count so we don't have to regenerate it in the
Expand All @@ -7880,6 +7901,12 @@ EpilogueVectorizerMainLoop::emitIterationCountCheck(BasicBlock *Bypass,
setBranchWeights(BI, MinItersBypassWeights, /*IsExpected=*/false);
ReplaceInstWithInst(TCCheckBlock->getTerminator(), &BI);

VPBlockBase *VectorPH = Plan.getVectorPreheader();
VPBlockBase *PredVPB = VectorPH->getSinglePredecessor();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPBlockBase *PredVPB = VectorPH->getSinglePredecessor();
VPBlockBase *PreVectorPH = VectorPH->getSinglePredecessor();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

if (PredVPB->getNumSuccessors() == 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this distinction between having a single successor (VectorPH) and two (VectorPH and ... ScalarPH?) correspond to ForEpilogue or not (forMain)? Worth some explanation, compared to InnerLoopVectorizer::emitIterationCountCheck() which always applies connectScalarPreheaderInVPlan().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ForEpilogue can be checked instead, updated and added a comment, thanks!

connectScalarPreheaderInVPlan(Plan);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth connecting scalar preheader from the outset rather than here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places that assume the scalar PH has a single predecessor, which would need to be updated. Could pull this in here or adjust as follow-up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up would be fine.

else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth introducing TCCheckBlock also when it appears first, rather than "folding" it into Entry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in having the entry connect directly to TCCheckBlock?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somewhat confusing to wrap TCCheckBlock only below under "else" but not above under "if", given that it is created in both cases. But this seems similar to InnerLoopVectorizer::emitIterationCountCheck().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the check as suggested above and added a comment

introduceCheckBlockInVPlan(Plan, TCCheckBlock);
return TCCheckBlock;
}

Expand Down Expand Up @@ -7910,9 +7937,6 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
EPI.MainLoopIterationCountCheck->getTerminator()->replaceUsesOfWith(
VecEpilogueIterationCountCheck, LoopVectorPreHeader);

DT->changeImmediateDominator(LoopVectorPreHeader,
EPI.MainLoopIterationCountCheck);

EPI.EpilogueIterationCountCheck->getTerminator()->replaceUsesOfWith(
VecEpilogueIterationCountCheck, LoopScalarPreHeader);

Expand All @@ -7923,19 +7947,8 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
EPI.MemSafetyCheck->getTerminator()->replaceUsesOfWith(
VecEpilogueIterationCountCheck, LoopScalarPreHeader);

DT->changeImmediateDominator(
VecEpilogueIterationCountCheck,
VecEpilogueIterationCountCheck->getSinglePredecessor());

DT->changeImmediateDominator(LoopScalarPreHeader,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one still needed?

(Review to be continued from here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this one is still needed for now.

EPI.EpilogueIterationCountCheck);
if (!Cost->requiresScalarEpilogue(EPI.EpilogueVF.isVector()))
// If there is an epilogue which must run, there's no edge from the
// middle block to exit blocks and thus no need to update the immediate
// dominator of the exit blocks.
DT->changeImmediateDominator(OrigLoop->getUniqueLatchExitBlock(),
EPI.EpilogueIterationCountCheck);

// Keep track of bypass blocks, as they feed start values to the induction and
// reduction phis in the scalar loop preheader.
if (EPI.SCEVSafetyCheck)
Expand Down Expand Up @@ -8038,6 +8051,20 @@ EpilogueVectorizerEpilogueLoop::emitMinimumVectorEpilogueIterCountCheck(
}
ReplaceInstWithInst(Insert->getTerminator(), &BI);
LoopBypassBlocks.push_back(Insert);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entry is conceptually an immutable VPIRBB that wraps the original scalar preheader, where SCEVs can be safely expanded, and all runtime checks are added as successors starting with minimal trip count check that is added as its terminal. Does this change when executing the VPlan of an epilog loop, whose Entry is replicated and transformed from old to new?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it changes conceptually, but here we need to wrap the new entry to be the new node created here, after executing the plan. (There is existing logic to re-use expanded SCEVs from the first execution of the main plan, to avoid duplicated expansions)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All VPlans are initialized with the original scalar preheader as their Entry, but this works only when vectorizing the main loop (w/ or w/o vectorizing the epilog). When vectorizing the epilog loop, Entry no longer serves to host SCEV-expand recipes but should instead refer to the block between Middle and "Vector Epilog PreHeader" (called "Vector Epilogue Trip Count Check", but that's also how the first block is named), as depicted in https://llvm.org/docs/Vectorizers.html#epilogue-vectorization. The documentation of Entry in createInitialVPlan() deserves update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, thanks!

// A new entry block has been created for the epilogue VPlan. Hook it in.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done using replaceVPBBWithIRVPBB()?

Better merge NewEntry with OldEntry here, rather than insert NewEntry after OldEntry and let subsequent merging of VPBB pairs merge them, because the latter avoids handling VPIRBB's?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done using replaceVPBBWithIRVPBB()?

I've simplified the code here and there should be nothing to merge at this point, I dropped the loop to remove the loop with VPIRInstructions. It may be possible to use replaceVPBBWithIRVPBB , but we still need to retrieve the new Block and update Entry. Left as is for now.

VPIRBasicBlock *NewEntry = VPIRBasicBlock::fromBasicBlock(Insert);
VPBasicBlock *OldEntry = Plan.getEntry();
VPBlockUtils::reassociateBlocks(OldEntry, NewEntry);
Plan.setEntry(NewEntry);
for (auto &R : make_early_inc_range(*NewEntry)) {
auto *VPIR = dyn_cast<VPIRInstruction>(&R);
if (!VPIR || !isa<PHINode>(VPIR->getInstruction()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely VPIR isn't null, having just created NewEntry fromBasicBlock Insert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code removed, thanks!

break;
VPIR->eraseFromParent();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erasing all of Insert's phi's from NewEntry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be empty, code remove, thanks!

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should OldEntry be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove it, thanks!


connectScalarPreheaderInVPlan(Plan);
return Insert;
}

Expand Down Expand Up @@ -10259,7 +10286,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
// should be removed once induction resume value creation is done
// directly in VPlan.
EpilogILV.setTripCount(MainILV.getTripCount());
for (auto &R : make_early_inc_range(*BestEpiPlan.getPreheader())) {
for (auto &R : make_early_inc_range(*BestEpiPlan.getEntry())) {
auto *ExpandR = dyn_cast<VPExpandSCEVRecipe>(&R);
if (!ExpandR)
continue;
Expand Down Expand Up @@ -10319,8 +10346,6 @@ bool LoopVectorizePass::processLoop(Loop *L) {
cast<VPHeaderPHIRecipe>(&R)->setStartValue(StartVal);
}

assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&
"DT not preserved correctly");
LVP.executePlan(EPI.EpilogueVF, EPI.EpilogueUF, BestEpiPlan, EpilogILV,
DT, true, &ExpandedSCEVs);
++LoopsEpilogueVectorized;
Expand Down Expand Up @@ -10348,6 +10373,9 @@ bool LoopVectorizePass::processLoop(Loop *L) {
checkMixedPrecision(L, ORE);
}

assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&
"DT not preserved correctly");

std::optional<MDNode *> RemainderLoopID =
makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
LLVMLoopVectorizeFollowupEpilogue});
Expand Down
71 changes: 46 additions & 25 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -170,9 +170,8 @@ VPBasicBlock *VPBlockBase::getEntryBasicBlock() {
}

void VPBlockBase::setPlan(VPlan *ParentPlan) {
assert(
(ParentPlan->getEntry() == this || ParentPlan->getPreheader() == this) &&
"Can only set plan on its entry or preheader block.");
assert(ParentPlan->getEntry() == this &&
"Can only set plan on its entry or preheader block.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Can only set plan on its entry or preheader block.");
"Can only set plan on its entry block.");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Plan = ParentPlan;
}

Expand Down Expand Up @@ -840,9 +839,6 @@ VPlan::~VPlan() {
Block->dropAllReferences(&DummyValue);

VPBlockBase::deleteCFG(Entry);

Preheader->dropAllReferences(&DummyValue);
delete Preheader;
}
for (VPValue *VPV : VPLiveInsToFree)
delete VPV;
Expand All @@ -865,9 +861,10 @@ VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
VPIRBasicBlock *Entry =
VPIRBasicBlock::fromBasicBlock(TheLoop->getLoopPreheader());
VPBasicBlock *VecPreheader = new VPBasicBlock("vector.ph");
VPBlockUtils::connectBlocks(Entry, VecPreheader);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that this connection is subject to insertion of runtime checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, thanks!

VPIRBasicBlock *ScalarHeader =
VPIRBasicBlock::fromBasicBlock(TheLoop->getHeader());
auto Plan = std::make_unique<VPlan>(Entry, VecPreheader, ScalarHeader);
auto Plan = std::make_unique<VPlan>(Entry, ScalarHeader);

// Create SCEV and VPValue for the trip count.

Expand Down Expand Up @@ -1010,8 +1007,9 @@ void VPlan::execute(VPTransformState *State) {
BasicBlock *VectorPreHeader = State->CFG.PrevBB;
State->Builder.SetInsertPoint(VectorPreHeader->getTerminator());

// Disconnect VectorPreHeader from ExitBB in both the CFG and DT.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to disconnect in DT only?
Retain the comment wrt DT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retained both, as the DT update is required, (otherwise the sanity checks during DT updates will trigger)

cast<BranchInst>(VectorPreHeader->getTerminator())->setSuccessor(0, nullptr);
replaceVPBBWithIRVPBB(
cast<VPBasicBlock>(getVectorLoopRegion()->getSinglePredecessor()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cast<VPBasicBlock>(getVectorLoopRegion()->getSinglePredecessor()),
getVectorPreheader(),

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks!

VectorPreHeader);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better placed next to the two replaceVPBBWithIRVPBB() below, and update the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks!

State->CFG.DTU.applyUpdates(
{{DominatorTree::Delete, VectorPreHeader, State->CFG.ExitBB}});

Expand All @@ -1025,6 +1023,11 @@ void VPlan::execute(VPTransformState *State) {
replaceVPBBWithIRVPBB(getScalarPreheader(), ScalarPh);
replaceVPBBWithIRVPBB(MiddleVPBB, MiddleBB);

LLVM_DEBUG(dbgs() << "Executing best plan with VF=" <<State->VF
<< ", UF=" << getUF()<< '\n');
setName("Final VPlan");
LLVM_DEBUG(dump());

// Disconnect the middle block from its single successor (the scalar loop
// header) in both the CFG and DT. The branch will be recreated during VPlan
// execution.
Expand All @@ -1038,8 +1041,10 @@ void VPlan::execute(VPTransformState *State) {
State->CFG.DTU.applyUpdates(
{{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}});

ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
Entry);
// Generate code in the loop pre-header and body.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be updated - code for runtime guards has already been generated (is simply skipped here), but code is generated also after the body (middle, scalar pre-header).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

for (VPBlockBase *Block : vp_depth_first_shallow(Entry))
for (VPBlockBase *Block : make_range(RPOT.begin(), RPOT.end()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would Block : RPOT suffice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, earlier versions skipped the pre-header but that is not needed in the latest version, simplified, thanks

Block->execute(State);

VPBasicBlock *LatchVPBB = getVectorLoopRegion()->getExitingBasicBlock();
Expand Down Expand Up @@ -1090,9 +1095,6 @@ void VPlan::execute(VPTransformState *State) {
}

State->CFG.DTU.flush();
assert(State->CFG.DTU.getDomTree().verify(
DominatorTree::VerificationLevel::Fast) &&
"DT not preserved correctly");
}

InstructionCost VPlan::cost(ElementCount VF, VPCostContext &Ctx) {
Expand Down Expand Up @@ -1145,12 +1147,10 @@ void VPlan::print(raw_ostream &O) const {

printLiveIns(O);

if (!getPreheader()->empty()) {
O << "\n";
getPreheader()->print(O, "", SlotTracker);
}
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<const VPBlockBase *>>
RPOT(getEntry());

for (const VPBlockBase *Block : vp_depth_first_shallow(getEntry())) {
for (const VPBlockBase *Block : RPOT) {
O << '\n';
Block->print(O, "", SlotTracker);
}
Expand Down Expand Up @@ -1181,6 +1181,20 @@ std::string VPlan::getName() const {
return Out;
}

VPRegionBlock *VPlan::getVectorLoopRegion() {
for (VPBlockBase *B : vp_depth_first_shallow(getEntry()))
if (auto *R = dyn_cast<VPRegionBlock>(B))
return R;
return nullptr;
Comment on lines +1172 to +1175
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better cache the result? Can be done as follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, mostly worried about invalidation (also when thinking about #117506 / #108378)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added thanks!

}

const VPRegionBlock *VPlan::getVectorLoopRegion() const {
for (const VPBlockBase *B : vp_depth_first_shallow(getEntry()))
if (auto *R = dyn_cast<VPRegionBlock>(B))
return R;
return nullptr;
}

LLVM_DUMP_METHOD
void VPlan::printDOT(raw_ostream &O) const {
VPlanPrinter Printer(O, *this);
Expand Down Expand Up @@ -1231,7 +1245,6 @@ static void remapOperands(VPBlockBase *Entry, VPBlockBase *NewEntry,

VPlan *VPlan::duplicate() {
// Clone blocks.
VPBasicBlock *NewPreheader = Preheader->clone();
const auto &[NewEntry, __] = cloneFrom(Entry);

BasicBlock *ScalarHeaderIRBB = getScalarHeader()->getIRBasicBlock();
Expand All @@ -1241,8 +1254,7 @@ VPlan *VPlan::duplicate() {
return VPIRBB && VPIRBB->getIRBasicBlock() == ScalarHeaderIRBB;
}));
// Create VPlan, clone live-ins and remap operands in the cloned blocks.
auto *NewPlan =
new VPlan(NewPreheader, cast<VPBasicBlock>(NewEntry), NewScalarHeader);
auto *NewPlan = new VPlan(cast<VPBasicBlock>(NewEntry), NewScalarHeader);
DenseMap<VPValue *, VPValue *> Old2NewVPValues;
for (VPValue *OldLiveIn : VPLiveInsToFree) {
Old2NewVPValues[OldLiveIn] =
Expand All @@ -1262,7 +1274,6 @@ VPlan *VPlan::duplicate() {
// else NewTripCount will be created and inserted into Old2NewVPValues when
// TripCount is cloned. In any case NewPlan->TripCount is updated below.

remapOperands(Preheader, NewPreheader, Old2NewVPValues);
remapOperands(Entry, NewEntry, Old2NewVPValues);

// Initialize remaining fields of cloned VPlan.
Expand All @@ -1276,6 +1287,19 @@ VPlan *VPlan::duplicate() {
return NewPlan;
}

VPBasicBlock *VPlan::getScalarPreheader() {
auto *MiddleVPBB =
cast<VPBasicBlock>(getVectorLoopRegion()->getSingleSuccessor());
if (MiddleVPBB->getNumSuccessors() == 2) {
// Order is strict: first is the exit block, second is the scalar preheader.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler to return last successor, provided it ain't a VPIRBB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified, thanks

return cast<VPBasicBlock>(MiddleVPBB->getSuccessors()[1]);
}
if (auto *IRVPBB = dyn_cast<VPBasicBlock>(MiddleVPBB->getSingleSuccessor()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dyn_casting to VPBB but calling it IRVPBB?
Intention is to return the single successor if it's not an IRVPBB, as that represents the exit block?

Suggested change
if (auto *IRVPBB = dyn_cast<VPBasicBlock>(MiddleVPBB->getSingleSuccessor()))
if (isa<VPIRBasicBlock>(MiddleVPBB->getSingleSuccessor())) {
// Exit block is IRVPBB, scalar preheader is not.
return nullptr;
}
return cast<VPBasicBlock>(MiddleVPBB->getSingleSuccessor());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code as per comment above, thanks

return IRVPBB;

return nullptr;
}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)

Twine VPlanPrinter::getUID(const VPBlockBase *Block) {
Expand Down Expand Up @@ -1314,8 +1338,6 @@ void VPlanPrinter::dump() {
OS << "edge [fontname=Courier, fontsize=30]\n";
OS << "compound=true\n";

dumpBlock(Plan.getPreheader());

for (const VPBlockBase *Block : vp_depth_first_shallow(Plan.getEntry()))
dumpBlock(Block);

Expand Down Expand Up @@ -1576,7 +1598,6 @@ void VPSlotTracker::assignNames(const VPlan &Plan) {
assignName(Plan.BackedgeTakenCount);
for (VPValue *LI : Plan.VPLiveInsToFree)
assignName(LI);
assignNames(Plan.getPreheader());

ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<const VPBlockBase *>>
RPOT(VPBlockDeepTraversalWrapper<const VPBlockBase *>(Plan.getEntry()));
Expand Down
Loading
Loading