Skip to content

[VPlan] Handle early exit before forming regions. (NFC) #138393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 4 additions & 8 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9383,7 +9383,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range,
VPlanTransforms::prepareForVectorization(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) Should prepareForVectorization be renamed to a more informative name, perhaps canonicalizeTopLoop, as it takes care of canonicalizing header and latch blocks, introducing and connecting preheader, middle-block, scalar preheader, canonical IV recipes and trip-count value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will to separately.

*Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop,
getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()));
getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()),
Legal->hasUncountableEarlyExit(), Range);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does passing Legal->hasUncountableEarlyExit() mean that (a) the loop has an uncountable early exit or (a+b) that this early exit should be sunk to the middle block - rather than (a+c: being sunk to the scalar remainder loop, i.e.,) leaving the last iteration be scalar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legal->hasUncountableEarlyExit() returns true if there's an uncountable early exit, for which we need to adjust the region exit condition + introduce an extra block to go to the early exit if needed.

If it is false, it means there are no uncountable early exits and we require a scalar epilogue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for clarifying!
"If it is false, it means there are no uncountable early exits, i.e., all early exits are countable which requires a scalar epilogue."

VPlanTransforms::createLoopRegions(*Plan);

// Don't use getDecisionAndClampRange here, because we don't know the UF
Expand Down Expand Up @@ -9584,12 +9585,6 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range,
R->setOperand(1, WideIV->getStepValue());
}

if (auto *UncountableExitingBlock =
Legal->getUncountableEarlyExitingBlock()) {
VPlanTransforms::runPass(VPlanTransforms::handleUncountableEarlyExit, *Plan,
OrigLoop, UncountableExitingBlock, RecipeBuilder,
Range);
}
DenseMap<VPValue *, VPValue *> IVEndValues;
addScalarResumePhis(RecipeBuilder, *Plan, IVEndValues);
SetVector<VPIRInstruction *> ExitUsersToFix =
Expand Down Expand Up @@ -9687,7 +9682,8 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlan(VFRange &Range) {
auto Plan = VPlanTransforms::buildPlainCFG(OrigLoop, *LI, VPB2IRBB);
VPlanTransforms::prepareForVectorization(
*Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop,
getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()));
getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()), false,
Range);
VPlanTransforms::createLoopRegions(*Plan);

for (ElementCount VF : Range)
Expand Down
33 changes: 23 additions & 10 deletions llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -460,11 +460,10 @@ static void addCanonicalIVRecipes(VPlan &Plan, VPBasicBlock *HeaderVPBB,
{CanonicalIVIncrement, &Plan.getVectorTripCount()}, DL);
}

void VPlanTransforms::prepareForVectorization(VPlan &Plan, Type *InductionTy,
PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck,
bool TailFolded, Loop *TheLoop,
DebugLoc IVDL) {
void VPlanTransforms::prepareForVectorization(
VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop,
DebugLoc IVDL, bool HasUncountableEarlyExit, VFRange &Range) {
VPDominatorTree VPDT;
VPDT.recalculate(Plan);

Expand All @@ -491,19 +490,33 @@ void VPlanTransforms::prepareForVectorization(VPlan &Plan, Type *InductionTy,
addCanonicalIVRecipes(Plan, cast<VPBasicBlock>(HeaderVPB),
cast<VPBasicBlock>(LatchVPB), InductionTy, IVDL);

// Disconnect all edges to exit blocks other than from the middle block.
// TODO: VPlans with early exits should be explicitly converted to a form
// exiting only via the latch here, including adjusting the exit condition,
// instead of simply disconnecting the edges and adjusting the VPlan later.
for (VPBlockBase *EB : Plan.getExitBlocks()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth replacing the old comment with a new one explaining what the loop is doing at a high level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

[[maybe_unused]] bool HandledUncountableEarlyExit = false;
// Disconnect all early exits from the loop leaving it with a single exit from
// the latch. Early exits that are countable are left for a scalar epilog. The
// condition of uncountable early exits (currently at most one is supported)
// is fused into the latch exit, and used to branch from middle block to the
// early exit destination.
for (VPIRBasicBlock *EB : Plan.getExitBlocks()) {
for (VPBlockBase *Pred : to_vector(EB->getPredecessors())) {
if (Pred == MiddleVPBB)
continue;
if (HasUncountableEarlyExit) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent note: this condition disregards the actual Pred->EB edge itself.

assert(!HandledUncountableEarlyExit &&
"can handle exactly one uncountable early exit");
handleUncountableEarlyExit(cast<VPBasicBlock>(Pred), EB, Plan,
cast<VPBasicBlock>(HeaderVPB),
cast<VPBasicBlock>(LatchVPB), Range);
HandledUncountableEarlyExit = true;
}

cast<VPBasicBlock>(Pred)->getTerminator()->eraseFromParent();
VPBlockUtils::disconnectBlocks(Pred, EB);
}
}

assert((!HasUncountableEarlyExit || HandledUncountableEarlyExit) &&
"missed an uncountable exit that must be handled");

// Create SCEV and VPValue for the trip count.
// We use the symbolic max backedge-taken-count, which works also when
// vectorizing loops with uncountable early exits.
Expand Down
81 changes: 37 additions & 44 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2461,63 +2461,56 @@ void VPlanTransforms::convertToConcreteRecipes(VPlan &Plan,
}

void VPlanTransforms::handleUncountableEarlyExit(
VPlan &Plan, Loop *OrigLoop, BasicBlock *UncountableExitingBlock,
VPRecipeBuilder &RecipeBuilder, VFRange &Range) {
VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
auto *LatchVPBB = cast<VPBasicBlock>(LoopRegion->getExiting());
VPBuilder Builder(LatchVPBB->getTerminator());
auto *MiddleVPBB = Plan.getMiddleBlock();
VPValue *IsEarlyExitTaken = nullptr;

// Process the uncountable exiting block. Update IsEarlyExitTaken, which
// tracks if the uncountable early exit has been taken. Also split the middle
// block and have it conditionally branch to the early exit block if
// EarlyExitTaken.
auto *EarlyExitingBranch =
cast<BranchInst>(UncountableExitingBlock->getTerminator());
BasicBlock *TrueSucc = EarlyExitingBranch->getSuccessor(0);
BasicBlock *FalseSucc = EarlyExitingBranch->getSuccessor(1);
BasicBlock *EarlyExitIRBB =
!OrigLoop->contains(TrueSucc) ? TrueSucc : FalseSucc;
VPIRBasicBlock *VPEarlyExitBlock = Plan.getExitBlock(EarlyExitIRBB);

VPValue *EarlyExitNotTakenCond = RecipeBuilder.getBlockInMask(
OrigLoop->contains(TrueSucc) ? TrueSucc : FalseSucc);
auto *EarlyExitTakenCond = Builder.createNot(EarlyExitNotTakenCond);
IsEarlyExitTaken =
Builder.createNaryOp(VPInstruction::AnyOf, {EarlyExitTakenCond});
VPBasicBlock *EarlyExitingVPBB, VPBasicBlock *EarlyExitVPBB, VPlan &Plan,
VPBasicBlock *HeaderVPBB, VPBasicBlock *LatchVPBB, VFRange &Range) {
using namespace llvm::VPlanPatternMatch;

VPBlockBase *MiddleVPBB = LatchVPBB->getSuccessors()[0];
if (!EarlyExitVPBB->getSinglePredecessor() &&
EarlyExitVPBB->getPredecessors()[1] == MiddleVPBB) {
assert(EarlyExitVPBB->getNumPredecessors() == 2 &&
EarlyExitVPBB->getPredecessors()[0] == EarlyExitingVPBB &&
"unsupported early exit VPBB");
// Early exit operand should always be last phi operand. If EarlyExitVPBB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding an assert that MiddleVPBB is actually the second predecessor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

// has two predecessors and EarlyExitingVPBB is the first, swap the operands
// of the phis.
for (VPRecipeBase &R : EarlyExitVPBB->phis())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making sure the operand corresponding to the middle block is always first, right? I guess in future if we do want to support multiple early exits this will get a bit more complicated, but swapping operands works for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

cast<VPIRPhi>(&R)->swapOperands();
}

VPBuilder Builder(LatchVPBB->getTerminator());
VPBlockBase *TrueSucc = EarlyExitingVPBB->getSuccessors()[0];
assert(
match(EarlyExitingVPBB->getTerminator(), m_BranchOnCond(m_VPValue())) &&
"Terminator must be be BranchOnCond");
VPValue *CondOfEarlyExitingVPBB =
EarlyExitingVPBB->getTerminator()->getOperand(0);
auto *CondToEarlyExit = TrueSucc == EarlyExitVPBB
? CondOfEarlyExitingVPBB
: Builder.createNot(CondOfEarlyExitingVPBB);

// Split the middle block and have it conditionally branch to the early exit
// block if CondToEarlyExit.
VPValue *IsEarlyExitTaken =
Builder.createNaryOp(VPInstruction::AnyOf, {CondToEarlyExit});
VPBasicBlock *NewMiddle = Plan.createVPBasicBlock("middle.split");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent:

Suggested change
VPBasicBlock *NewMiddle = Plan.createVPBasicBlock("middle.split");
VPBasicBlock *MiddleSplit = Plan.createVPBasicBlock("middle.split");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do separately, thanks

VPBasicBlock *VectorEarlyExitVPBB =
Plan.createVPBasicBlock("vector.early.exit");
VPBlockUtils::insertOnEdge(LoopRegion, MiddleVPBB, NewMiddle);
VPBlockUtils::insertOnEdge(LatchVPBB, MiddleVPBB, NewMiddle);
VPBlockUtils::connectBlocks(NewMiddle, VectorEarlyExitVPBB);
NewMiddle->swapSuccessors();

VPBlockUtils::connectBlocks(VectorEarlyExitVPBB, VPEarlyExitBlock);
VPBlockUtils::connectBlocks(VectorEarlyExitVPBB, EarlyExitVPBB);

// Update the exit phis in the early exit block.
VPBuilder MiddleBuilder(NewMiddle);
VPBuilder EarlyExitB(VectorEarlyExitVPBB);
for (VPRecipeBase &R : VPEarlyExitBlock->phis()) {
for (VPRecipeBase &R : EarlyExitVPBB->phis()) {
auto *ExitIRI = cast<VPIRPhi>(&R);
// Early exit operand should always be last, i.e., 0 if VPEarlyExitBlock has
// Early exit operand should always be last, i.e., 0 if EarlyExitVPBB has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still valid given you've already swapped operands above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ep, this just re-states the expected order, as explanation for setting early exit index below.

// a single predecessor and 1 if it has two.
unsigned EarlyExitIdx = ExitIRI->getNumOperands() - 1;
if (!VPEarlyExitBlock->getSinglePredecessor()) {
// If VPEarlyExitBlock has two predecessors, they are already ordered such
// that early exit is second (and latch exit is first), by construction.
// But its underlying IRBB (EarlyExitIRBB) may have its predecessors
// ordered the other way around, and it is the order of the latter which
// corresponds to the order of operands of VPEarlyExitBlock's phi recipes.
// Therefore, if early exit (UncountableExitingBlock) is the first
// predecessor of EarlyExitIRBB, we swap the operands of phi recipes,
// thereby bringing them to match VPEarlyExitBlock's predecessor order,
// with early exit being last (second). Otherwise they already match.
if (*pred_begin(VPEarlyExitBlock->getIRBasicBlock()) ==
UncountableExitingBlock)
ExitIRI->swapOperands();

if (ExitIRI->getNumOperands() != 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant for this patch, but the name extractLastLaneOfFirstOperand confused me at first because I was expecting it to return a value. It feels like it should be something like updatedFirstOperandToExtractLastLane.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to improve the naming, but it may be even better to create all required extracts up-front separately, which is something I am looking into

// The first of two operands corresponds to the latch exit, via MiddleVPBB
// predecessor. Extract its last lane.
ExitIRI->extractLastLaneOfFirstOperand(MiddleBuilder);
Expand All @@ -2533,7 +2526,7 @@ void VPlanTransforms::handleUncountableEarlyExit(
LoopVectorizationPlanner::getDecisionAndClampRange(IsVector, Range)) {
// Update the incoming value from the early exit.
VPValue *FirstActiveLane = EarlyExitB.createNaryOp(
VPInstruction::FirstActiveLane, {EarlyExitTakenCond}, nullptr,
VPInstruction::FirstActiveLane, {CondToEarlyExit}, nullptr,
"first.active.lane");
IncomingFromEarlyExit = EarlyExitB.createNaryOp(
Instruction::ExtractElement, {IncomingFromEarlyExit, FirstActiveLane},
Expand Down
20 changes: 11 additions & 9 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ struct VPlanTransforms {
PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck,
bool TailFolded, Loop *TheLoop,
DebugLoc IVDL);
DebugLoc IVDL, bool HasUncountableExit,
VFRange &Range);

/// Replace loops in \p Plan's flat CFG with VPRegionBlocks, turning \p Plan's
/// flat CFG into a hierarchical CFG.
Expand Down Expand Up @@ -173,15 +174,16 @@ struct VPlanTransforms {
/// Remove dead recipes from \p Plan.
static void removeDeadRecipes(VPlan &Plan);

/// Update \p Plan to account for the uncountable early exit block in \p
/// UncountableExitingBlock by
/// * updating the condition exiting the vector loop to include the early
/// exit conditions
/// Update \p Plan to account for the uncountable early exit from \p
/// EarlyExitingVPBB to \p EarlyExitVPBB by
/// * updating the condition exiting the loop via the latch to include the
/// early exit condition,
/// * splitting the original middle block to branch to the early exit block
/// if taken.
static void handleUncountableEarlyExit(VPlan &Plan, Loop *OrigLoop,
BasicBlock *UncountableExitingBlock,
VPRecipeBuilder &RecipeBuilder,
/// conditionally - according to the early exit condition.
static void handleUncountableEarlyExit(VPBasicBlock *EarlyExitingVPBB,
VPBasicBlock *EarlyExitVPBB,
VPlan &Plan, VPBasicBlock *HeaderVPBB,
VPBasicBlock *LatchVPBB,
VFRange &Range);

/// Lower abstract recipes to concrete ones, that can be codegen'd. Use \p
Expand Down
4 changes: 3 additions & 1 deletion llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#define LLVM_UNITTESTS_TRANSFORMS_VECTORIZE_VPLANTESTBASE_H

#include "../lib/Transforms/Vectorize/VPlan.h"
#include "../lib/Transforms/Vectorize/VPlanHelpers.h"
#include "../lib/Transforms/Vectorize/VPlanTransforms.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"
Expand Down Expand Up @@ -72,8 +73,9 @@ class VPlanTestIRBase : public testing::Test {
PredicatedScalarEvolution PSE(*SE, *L);
DenseMap<const VPBlockBase *, BasicBlock *> VPB2IRBB;
auto Plan = VPlanTransforms::buildPlainCFG(L, *LI, VPB2IRBB);
VFRange R(ElementCount::getFixed(1), ElementCount::getFixed(2));
VPlanTransforms::prepareForVectorization(*Plan, IntegerType::get(*Ctx, 64),
PSE, true, false, L, {});
PSE, true, false, L, {}, false, R);
VPlanTransforms::createLoopRegions(*Plan);
return Plan;
}
Expand Down
Loading