Skip to content

Commit

Permalink
[LV] Vectorize (some) early and multiple exit loops
Browse files Browse the repository at this point in the history
This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways:

    single exit loops which are not bottom tested
    multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later)

The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts.

The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration.

The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical).

Differential Revision: https://reviews.llvm.org/D93317
  • Loading branch information
preames committed Dec 28, 2020
1 parent dcd2157 commit e4df6a4
Show file tree
Hide file tree
Showing 5 changed files with 514 additions and 50 deletions.
25 changes: 17 additions & 8 deletions llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1095,9 +1095,15 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop *Lp,
return false;
}

// We must have a single exiting block.
if (!Lp->getExitingBlock()) {
reportVectorizationFailure("The loop must have an exiting block",
// We currently must have a single "exit block" after the loop. Note that
// multiple "exiting blocks" inside the loop are allowed, provided they all
// reach the single exit block.
// TODO: This restriction can be relaxed in the near future, it's here solely
// to allow separation of changes for review. We need to generalize the phi
// update logic in a number of places.
BasicBlock *ExitBB = Lp->getUniqueExitBlock();
if (!ExitBB) {
reportVectorizationFailure("The loop must have a unique exit block",
"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop);
if (DoExtraAnalysis)
Expand All @@ -1106,11 +1112,14 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop *Lp,
return false;
}

// We only handle bottom-tested loops, i.e. loop in which the condition is
// checked at the end of each iteration. With that we can assume that all
// instructions in the loop are executed the same number of times.
if (Lp->getExitingBlock() != Lp->getLoopLatch()) {
reportVectorizationFailure("The exiting block is not the loop latch",
// The existing code assumes that LCSSA implies that phis are single entry
// (which was true when we had at most a single exiting edge from the latch).
// In general, there's nothing which prevents an LCSSA phi in exit block from
// having two or more values if there are multiple exiting edges leading to
// the exit block. (TODO: implement general case)
if (!empty(ExitBB->phis()) && !ExitBB->getSinglePredecessor()) {
reportVectorizationFailure("The loop must have no live-out values if "
"it has more than one exiting block",
"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop);
if (DoExtraAnalysis)
Expand Down
59 changes: 46 additions & 13 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -837,7 +837,8 @@ class InnerLoopVectorizer {
/// Middle Block between the vector and the scalar.
BasicBlock *LoopMiddleBlock;

/// The ExitBlock of the scalar loop.
/// The (unique) ExitBlock of the scalar loop. Note that
/// there can be multiple exiting edges reaching this block.
BasicBlock *LoopExitBlock;

/// The vector loop body.
Expand Down Expand Up @@ -1548,11 +1549,16 @@ class LoopVectorizationCostModel {
return InterleaveInfo.getInterleaveGroup(Instr);
}

/// Returns true if an interleaved group requires a scalar iteration
/// to handle accesses with gaps, and there is nothing preventing us from
/// creating a scalar epilogue.
/// Returns true if we're required to use a scalar epilogue for at least
/// the final iteration of the original loop.
bool requiresScalarEpilogue() const {
return isScalarEpilogueAllowed() && InterleaveInfo.requiresScalarEpilogue();
if (!isScalarEpilogueAllowed())
return false;
// If we might exit from anywhere but the latch, must run the exiting
// iteration in scalar form.
if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch())
return true;
return InterleaveInfo.requiresScalarEpilogue();
}

/// Returns true if a scalar epilogue is not allowed due to optsize or a
Expand Down Expand Up @@ -2912,7 +2918,7 @@ PHINode *InnerLoopVectorizer::createInductionVariable(Loop *L, Value *Start,
Induction->addIncoming(Next, Latch);
// Create the compare.
Value *ICmp = Builder.CreateICmpEQ(Next, End);
Builder.CreateCondBr(ICmp, L->getExitBlock(), Header);
Builder.CreateCondBr(ICmp, L->getUniqueExitBlock(), Header);

// Now we have two terminators. Remove the old one from the block.
Latch->getTerminator()->eraseFromParent();
Expand Down Expand Up @@ -3000,13 +3006,17 @@ Value *InnerLoopVectorizer::getOrCreateVectorTripCount(Loop *L) {
// unroll factor (number of SIMD instructions).
Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");

// If there is a non-reversed interleaved group that may speculatively access
// memory out-of-bounds, we need to ensure that there will be at least one
// iteration of the scalar epilogue loop. Thus, if the step evenly divides
// There are two cases where we need to ensure (at least) the last iteration
// runs in the scalar remainder loop. Thus, if the step evenly divides
// the trip count, we set the remainder to be equal to the step. If the step
// does not evenly divide the trip count, no adjustment is necessary since
// there will already be scalar iterations. Note that the minimum iterations
// check ensures that N >= Step.
// check ensures that N >= Step. The cases are:
// 1) If there is a non-reversed interleaved group that may speculatively
// access memory out-of-bounds.
// 2) If any instruction may follow a conditionally taken exit. That is, if
// the loop contains multiple exiting blocks, or a single exiting block
// which is not the latch.
if (VF.isVector() && Cost->requiresScalarEpilogue()) {
auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));
R = Builder.CreateSelect(IsZero, Step, R);
Expand Down Expand Up @@ -3301,7 +3311,7 @@ Value *InnerLoopVectorizer::emitTransformedIndex(
Loop *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
LoopScalarBody = OrigLoop->getHeader();
LoopVectorPreHeader = OrigLoop->getLoopPreheader();
LoopExitBlock = OrigLoop->getExitBlock();
LoopExitBlock = OrigLoop->getUniqueExitBlock();
assert(LoopExitBlock && "Must have an exit block");
assert(LoopVectorPreHeader && "Invalid loop structure");

Expand Down Expand Up @@ -3572,7 +3582,7 @@ void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
// value (the value that feeds into the phi from the loop latch).
// We allow both, but they, obviously, have different values.

assert(OrigLoop->getExitBlock() && "Expected a single exit block");
assert(OrigLoop->getUniqueExitBlock() && "Expected a single exit block");

DenseMap<Value *, Value *> MissingVals;

Expand Down Expand Up @@ -5490,11 +5500,28 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
// for size.
if (runtimeChecksRequired())
return None;

break;
}

// Now try the tail folding
// The only loops we can vectorize without a scalar epilogue, are loops with
// a bottom-test and a single exiting block. We'd have to handle the fact
// that not every instruction executes on the last iteration. This will
// require a lane mask which varies through the vector loop body. (TODO)
if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
// If there was a tail-folding hint/switch, but we can't fold the tail by
// masking, fallback to a vectorization with a scalar epilogue.
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
"scalar epilogue instead.\n");
ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;
return MaxVF;
}
return None;
}

// Now try the tail folding

// Invalidate interleave groups that require an epilogue if we can't mask
// the interleave-group.
if (!useMaskedInterleavedAccesses(TTI)) {
Expand Down Expand Up @@ -7921,6 +7948,12 @@ VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst,
if (!BI->isConditional() || BI->getSuccessor(0) == BI->getSuccessor(1))
return EdgeMaskCache[Edge] = SrcMask;

// If source is an exiting block, we know the exit edge is dynamically dead
// in the vector loop, and thus we don't need to restrict the mask. Avoid
// adding uses of an otherwise potentially dead instruction.
if (OrigLoop->isLoopExiting(Src))
return EdgeMaskCache[Edge] = SrcMask;

VPValue *EdgeMask = Plan->getOrAddVPValue(BI->getCondition());
assert(EdgeMask && "No Edge Mask found for condition");

Expand Down
2 changes: 1 addition & 1 deletion llvm/test/Transforms/LoopVectorize/control-flow.ll
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
; return 0;
; }

; CHECK: remark: source.cpp:5:9: loop not vectorized: loop control flow is not understood by vectorizer
; CHECK: remark: source.cpp:5:9: loop not vectorized: could not determine number of loop iterations
; CHECK: remark: source.cpp:5:9: loop not vectorized

; CHECK: _Z4testPii
Expand Down
Loading

0 comments on commit e4df6a4

Please sign in to comment.