Skip to content

[LoopPeel] Support peeling last iteration with multiple exits. #141247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented May 23, 2025

Generalize the logic to peel from end to work for multi-exit loops, by checking the exit count of the latch instead of the backedge-taken count (of all exits). This allows peeling quite a few more loops (e.g. 250 loops peeled with the change vs 47 peeled w/o on a IR corpus including SPEC, llvm-test-suite and a few proprietary workloads).

Note that the current version won't peel loops where the backedge-taken-count is < exit-count-of-latch, i.e. we don't exit via the latch. We can probably rely on other passes to remove such exits.

Generalize the logic to peel from end to work for multi-exit loops, by
checking the exit count of the latch instead of the backedge-taken count
(of all exits). This allows peeling quite a few more loops (e.g. 250
loops peeled with the change vs 47 peeled w/o on a IR corpus including
SPEC, llvm-test-suite and a few proprietary workloads).
@llvmbot
Copy link
Member

llvmbot commented May 23, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Generalize the logic to peel from end to work for multi-exit loops, by checking the exit count of the latch instead of the backedge-taken count (of all exits). This allows peeling quite a few more loops (e.g. 250 loops peeled with the change vs 47 peeled w/o on a IR corpus including SPEC, llvm-test-suite and a few proprietary workloads).

Note that the current version won't peel loops where the backedge-taken-count is < exit-count-of-latch, i.e. we don't exit via the latch. We can probably rely on other passes to remove such exits.


Patch is 24.84 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141247.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/LoopPeel.cpp (+33-21)
  • (modified) llvm/test/Transforms/LoopUnroll/peel-last-iteration-multi-exit.ll (+150-34)
  • (modified) llvm/test/Transforms/LoopUnroll/peel-last-iteration-with-constant-trip-count.ll (+31-8)
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 4eaa3c9714370..353cee24ed3a8 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -327,26 +327,27 @@ static unsigned peelToTurnInvariantLoadsDerefencebale(Loop &L,
 }
 
 bool llvm::canPeelLastIteration(const Loop &L, ScalarEvolution &SE) {
-  const SCEV *BTC = SE.getBackedgeTakenCount(&L);
   Value *Inc;
   CmpPredicate Pred;
   BasicBlock *Succ1;
   BasicBlock *Succ2;
-  // The loop must execute at least 2 iterations to guarantee that peeled
-  // iteration executes.
+  BasicBlock *Latch = L.getLoopLatch();
+  // The loop must exit via the latch and additional exits are fine.
+  if (!Latch || !L.isLoopExiting(Latch))
+    return false;
+
+  // The loop's exit count via the latch must be least 1 to guarantee that
+  // peeled iteration executes.
   // TODO: Add checks during codegen.
-  if (isa<SCEVCouldNotCompute>(BTC) ||
-      !SE.isKnownPredicate(CmpInst::ICMP_UGT, BTC, SE.getZero(BTC->getType())))
+  const SCEV *EC = SE.getExitCount(&L, Latch);
+  if (isa<SCEVCouldNotCompute>(EC) ||
+      !SE.isKnownPredicate(CmpInst::ICMP_NE, EC, SE.getZero(EC->getType())))
     return false;
 
   // Check if the exit condition of the loop can be adjusted by the peeling
-  // codegen. For now, it must
-  // * exit via the latch,
-  // * the exit condition must be a NE/EQ compare of an induction with step
-  // of 1 and must only be used by the exiting branch.
-  BasicBlock *Latch = L.getLoopLatch();
-  return Latch && Latch == L.getExitingBlock() &&
-         match(Latch->getTerminator(),
+  // codegen. For now the exit condition of the latch must be a NE/EQ compare of
+  // an induction with step of 1 and must only be used by the exiting branch.
+  return match(Latch->getTerminator(),
                m_Br(m_OneUse(m_ICmp(Pred, m_Value(Inc), m_Value())),
                     m_BasicBlock(Succ1), m_BasicBlock(Succ2))) &&
          ((Pred == CmpInst::ICMP_EQ && Succ2 == L.getHeader()) ||
@@ -365,10 +366,10 @@ static bool shouldPeelLastIteration(Loop &L, CmpPredicate Pred,
   if (!canPeelLastIteration(L, SE))
     return false;
 
-  const SCEV *BTC = SE.getBackedgeTakenCount(&L);
-  const SCEV *ValAtLastIter = LeftAR->evaluateAtIteration(BTC, SE);
+  const SCEV *EC = SE.getExitCount(&L, L.getLoopLatch());
+  const SCEV *ValAtLastIter = LeftAR->evaluateAtIteration(EC, SE);
   const SCEV *ValAtSecondToLastIter = LeftAR->evaluateAtIteration(
-      SE.getMinusSCEV(BTC, SE.getOne(BTC->getType())), SE);
+      SE.getMinusSCEV(EC, SE.getOne(EC->getType())), SE);
 
   return SE.isKnownPredicate(ICmpInst::getInversePredicate(Pred), ValAtLastIter,
                              RightSCEV) &&
@@ -944,6 +945,8 @@ static void cloneLoopBlocks(
   // a value coming into the header.
   for (auto Edge : ExitEdges)
     for (PHINode &PHI : Edge.second->phis()) {
+      if (PeelLast && Edge.first == Latch)
+        continue;
       Value *LatchVal = PHI.getIncomingValueForBlock(Edge.first);
       Instruction *LatchInst = dyn_cast<Instruction>(LatchVal);
       if (LatchInst && L->contains(LatchInst))
@@ -1020,7 +1023,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
   BasicBlock *PreHeader = L->getLoopPreheader();
   BasicBlock *Latch = L->getLoopLatch();
   SmallVector<std::pair<BasicBlock *, BasicBlock *>, 4> ExitEdges;
-  L->getExitEdges(ExitEdges);
 
   // Remember dominators of blocks we might reach through exits to change them
   // later. Immediate dominator of such block might change, because we add more
@@ -1076,12 +1078,16 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
     // InsertBot:
     // Exit:
     // ...
-    BasicBlock *Exit = L->getExitBlock();
+    auto *LatchBr = cast<BranchInst>(Latch->getTerminator());
+    BasicBlock *Exit = L->contains(LatchBr->getSuccessor(0))
+                           ? LatchBr->getSuccessor(1)
+                           : LatchBr->getSuccessor(0);
     for (PHINode &P : Exit->phis())
       ExitValues[&P] = P.getIncomingValueForBlock(Latch);
 
     InsertTop = SplitEdge(Latch, Exit, &DT, LI);
     InsertBot = SplitBlock(InsertTop, InsertTop->getTerminator(), &DT, LI);
+    L->getExitEdges(ExitEdges);
 
     InsertTop->setName(Exit->getName() + ".peel.begin");
     InsertBot->setName(Exit->getName() + ".peel.next");
@@ -1138,6 +1144,7 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
     InsertTop->setName(Header->getName() + ".peel.begin");
     InsertBot->setName(Header->getName() + ".peel.next");
     NewPreHeader->setName(PreHeader->getName() + ".peel.newph");
+    L->getExitEdges(ExitEdges);
   }
 
   Instruction *LatchTerm =
@@ -1211,10 +1218,15 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
   }
 
   if (PeelLast) {
-    // Now adjust users of the original exit values by replacing them with the
-    // exit value from the peeled iteration.
-    for (const auto &[P, E] : ExitValues)
-      P->replaceAllUsesWith(isa<Constant>(E) ? E : &*VMap.lookup(E));
+    if (ExitEdges.size() == 1) {
+      // If we have a single existing edge, adjust users of the original exit
+      // values by replacing them with the exit value from the peeled iteration.
+      // If there are multiple exiting edges, all users outside the loop are
+      // served by a common exit block with LCSSA phis that will get updated to
+      // use the value from the peeled iteration separately.
+      for (const auto &[P, E] : ExitValues)
+        P->replaceAllUsesWith(isa<Constant>(E) ? E : &*VMap.lookup(E));
+    }
     formLCSSA(*L, DT, LI, SE);
   } else {
     // Now adjust the phi nodes in the loop header to get their initial values
diff --git a/llvm/test/Transforms/LoopUnroll/peel-last-iteration-multi-exit.ll b/llvm/test/Transforms/LoopUnroll/peel-last-iteration-multi-exit.ll
index 89accea695bc8..11db0d7ba5185 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-last-iteration-multi-exit.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-last-iteration-multi-exit.ll
@@ -62,16 +62,35 @@ define void @peel_last_multi_exit_btc_computable_no_exit_values(i32 %n) {
 ; CHECK-NEXT:  [[ENTRY:.*]]:
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
-; CHECK-NEXT:    [[IV_NEXT_LCSSA:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT_PEEL:%.*]], %[[LOOP_LATCH:.*]] ]
-; CHECK-NEXT:    [[EC_0_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], [[N]]
-; CHECK-NEXT:    br i1 [[EC_0_PEEL]], label %[[EXIT:.*]], label %[[LOOP_LATCH]]
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
+; CHECK-NEXT:    [[EC_0:%.*]] = icmp eq i32 [[IV]], [[N]]
+; CHECK-NEXT:    br i1 [[EC_0]], label %[[EXITSPLIT_LOOPEXIT:.*]], label %[[LOOP_LATCH]]
 ; CHECK:       [[LOOP_LATCH]]:
+; CHECK-NEXT:    call void @foo(i32 20)
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 16
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT_PEEL_BEGIN:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[EXIT_PEEL_BEGIN]]:
+; CHECK-NEXT:    [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    br label %[[LOOP_HEADER_PEEL:.*]]
+; CHECK:       [[LOOP_HEADER_PEEL]]:
+; CHECK-NEXT:    [[EC_0_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], [[N]]
+; CHECK-NEXT:    br i1 [[EC_0_PEEL]], label %[[EXITSPLIT:.*]], label %[[LOOP_LATCH_PEEL:.*]]
+; CHECK:       [[LOOP_LATCH_PEEL]]:
 ; CHECK-NEXT:    [[C_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], 16
 ; CHECK-NEXT:    [[COND_PEEL:%.*]] = select i1 [[C_PEEL]], i32 10, i32 20
 ; CHECK-NEXT:    call void @foo(i32 [[COND_PEEL]])
-; CHECK-NEXT:    [[IV_NEXT_PEEL]] = add i32 [[IV_NEXT_LCSSA]], 1
+; CHECK-NEXT:    [[IV_NEXT_PEEL:%.*]] = add i32 [[IV_NEXT_LCSSA]], 1
 ; CHECK-NEXT:    [[EC_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_PEEL]], 17
-; CHECK-NEXT:    br i1 [[EC_PEEL]], label %[[EXIT]], label %[[LOOP_HEADER]]
+; CHECK-NEXT:    br i1 [[EC_PEEL]], label %[[EXIT_PEEL_NEXT:.*]], label %[[EXIT_PEEL_NEXT]]
+; CHECK:       [[EXIT_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[LOOP_HEADER_PEEL_NEXT:.*]]
+; CHECK:       [[LOOP_HEADER_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXITSPLIT_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[EXITSPLIT]]
+; CHECK:       [[EXITSPLIT]]:
+; CHECK-NEXT:    br label %[[EXIT]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -101,18 +120,40 @@ define i32 @peel_last_multi_exit_btc_computable_exit_constant_values(i32 %n) {
 ; CHECK-NEXT:  [[ENTRY:.*]]:
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
-; CHECK-NEXT:    [[IV_NEXT_LCSSA:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT_PEEL:%.*]], %[[LOOP_LATCH:.*]] ]
-; CHECK-NEXT:    [[EC_0_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], [[N]]
-; CHECK-NEXT:    br i1 [[EC_0_PEEL]], label %[[EXIT:.*]], label %[[LOOP_LATCH]]
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
+; CHECK-NEXT:    [[EC_0:%.*]] = icmp eq i32 [[IV]], [[N]]
+; CHECK-NEXT:    br i1 [[EC_0]], label %[[EXITSPLIT_LOOPEXIT:.*]], label %[[LOOP_LATCH]]
 ; CHECK:       [[LOOP_LATCH]]:
+; CHECK-NEXT:    call void @foo(i32 20)
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 16
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT_PEEL_BEGIN:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK:       [[EXIT_PEEL_BEGIN]]:
+; CHECK-NEXT:    [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    [[SPLIT:%.*]] = phi i32 [ 2, %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    br label %[[LOOP_HEADER_PEEL:.*]]
+; CHECK:       [[LOOP_HEADER_PEEL]]:
+; CHECK-NEXT:    [[EC_0_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], [[N]]
+; CHECK-NEXT:    br i1 [[EC_0_PEEL]], label %[[EXITSPLIT:.*]], label %[[LOOP_LATCH_PEEL:.*]]
+; CHECK:       [[LOOP_LATCH_PEEL]]:
 ; CHECK-NEXT:    [[C_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], 16
 ; CHECK-NEXT:    [[COND_PEEL:%.*]] = select i1 [[C_PEEL]], i32 10, i32 20
 ; CHECK-NEXT:    call void @foo(i32 [[COND_PEEL]])
-; CHECK-NEXT:    [[IV_NEXT_PEEL]] = add i32 [[IV_NEXT_LCSSA]], 1
+; CHECK-NEXT:    [[IV_NEXT_PEEL:%.*]] = add i32 [[IV_NEXT_LCSSA]], 1
 ; CHECK-NEXT:    [[EC_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_PEEL]], 17
-; CHECK-NEXT:    br i1 [[EC_PEEL]], label %[[EXIT]], label %[[LOOP_HEADER]]
+; CHECK-NEXT:    br i1 [[EC_PEEL]], label %[[EXIT_PEEL_NEXT:.*]], label %[[EXIT_PEEL_NEXT]]
+; CHECK:       [[EXIT_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[LOOP_HEADER_PEEL_NEXT:.*]]
+; CHECK:       [[LOOP_HEADER_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXITSPLIT_LOOPEXIT]]:
+; CHECK-NEXT:    [[RES_PH_PH:%.*]] = phi i32 [ 1, %[[LOOP_HEADER]] ]
+; CHECK-NEXT:    br label %[[EXITSPLIT]]
+; CHECK:       [[EXITSPLIT]]:
+; CHECK-NEXT:    [[RES_PH:%.*]] = phi i32 [ 1, %[[LOOP_HEADER_PEEL]] ], [ [[RES_PH_PH]], %[[EXITSPLIT_LOOPEXIT]] ]
+; CHECK-NEXT:    br label %[[EXIT]]
 ; CHECK:       [[EXIT]]:
-; CHECK-NEXT:    [[RES:%.*]] = phi i32 [ 1, %[[LOOP_HEADER]] ], [ 2, %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    [[RES:%.*]] = phi i32 [ [[SPLIT]], %[[LOOP_HEADER_PEEL_NEXT]] ], [ [[RES_PH]], %[[EXITSPLIT]] ]
 ; CHECK-NEXT:    ret i32 [[RES]]
 ;
 entry:
@@ -142,18 +183,40 @@ define i32 @peel_last_multi_exit_btc_computable_exit_values_from_loop(i32 %n) {
 ; CHECK-NEXT:  [[ENTRY:.*]]:
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
-; CHECK-NEXT:    [[IV_NEXT_LCSSA:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT_PEEL:%.*]], %[[LOOP_LATCH:.*]] ]
-; CHECK-NEXT:    [[EC_0_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], [[N]]
-; CHECK-NEXT:    br i1 [[EC_0_PEEL]], label %[[EXIT:.*]], label %[[LOOP_LATCH]]
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
+; CHECK-NEXT:    [[EC_0:%.*]] = icmp eq i32 [[IV]], [[N]]
+; CHECK-NEXT:    br i1 [[EC_0]], label %[[EXITSPLIT_LOOPEXIT:.*]], label %[[LOOP_LATCH]]
 ; CHECK:       [[LOOP_LATCH]]:
+; CHECK-NEXT:    call void @foo(i32 20)
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 16
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT_PEEL_BEGIN:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[EXIT_PEEL_BEGIN]]:
+; CHECK-NEXT:    [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    [[SPLIT:%.*]] = phi i32 [ 20, %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    br label %[[LOOP_HEADER_PEEL:.*]]
+; CHECK:       [[LOOP_HEADER_PEEL]]:
+; CHECK-NEXT:    [[EC_0_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], [[N]]
+; CHECK-NEXT:    br i1 [[EC_0_PEEL]], label %[[EXITSPLIT:.*]], label %[[LOOP_LATCH_PEEL:.*]]
+; CHECK:       [[LOOP_LATCH_PEEL]]:
 ; CHECK-NEXT:    [[C_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], 16
 ; CHECK-NEXT:    [[COND_PEEL:%.*]] = select i1 [[C_PEEL]], i32 10, i32 20
 ; CHECK-NEXT:    call void @foo(i32 [[COND_PEEL]])
-; CHECK-NEXT:    [[IV_NEXT_PEEL]] = add i32 [[IV_NEXT_LCSSA]], 1
+; CHECK-NEXT:    [[IV_NEXT_PEEL:%.*]] = add i32 [[IV_NEXT_LCSSA]], 1
 ; CHECK-NEXT:    [[EC_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_PEEL]], 17
-; CHECK-NEXT:    br i1 [[EC_PEEL]], label %[[EXIT]], label %[[LOOP_HEADER]]
+; CHECK-NEXT:    br i1 [[EC_PEEL]], label %[[EXIT_PEEL_NEXT:.*]], label %[[EXIT_PEEL_NEXT]]
+; CHECK:       [[EXIT_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[LOOP_HEADER_PEEL_NEXT:.*]]
+; CHECK:       [[LOOP_HEADER_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXITSPLIT_LOOPEXIT]]:
+; CHECK-NEXT:    [[RES_PH_PH:%.*]] = phi i32 [ [[IV]], %[[LOOP_HEADER]] ]
+; CHECK-NEXT:    br label %[[EXITSPLIT]]
+; CHECK:       [[EXITSPLIT]]:
+; CHECK-NEXT:    [[RES_PH:%.*]] = phi i32 [ [[IV_NEXT_LCSSA]], %[[LOOP_HEADER_PEEL]] ], [ [[RES_PH_PH]], %[[EXITSPLIT_LOOPEXIT]] ]
+; CHECK-NEXT:    br label %[[EXIT]]
 ; CHECK:       [[EXIT]]:
-; CHECK-NEXT:    [[RES:%.*]] = phi i32 [ [[IV_NEXT_LCSSA]], %[[LOOP_HEADER]] ], [ [[COND_PEEL]], %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    [[RES:%.*]] = phi i32 [ [[SPLIT]], %[[LOOP_HEADER_PEEL_NEXT]] ], [ [[RES_PH]], %[[EXITSPLIT]] ]
 ; CHECK-NEXT:    ret i32 [[RES]]
 ;
 entry:
@@ -183,21 +246,39 @@ define i32 @peel_last_multi_exit_btc_computable_exit_values_from_loop_multiple_e
 ; CHECK-NEXT:  [[ENTRY:.*]]:
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
-; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT1:%.*]], %[[LOOP_LATCH:.*]] ]
 ; CHECK-NEXT:    [[EC_0:%.*]] = icmp eq i32 [[IV]], [[N]]
-; CHECK-NEXT:    br i1 [[EC_0]], label %[[EXIT_0:.*]], label %[[LOOP_LATCH]]
+; CHECK-NEXT:    br i1 [[EC_0]], label %[[EXIT_0_LOOPEXIT:.*]], label %[[LOOP_LATCH]]
 ; CHECK:       [[LOOP_LATCH]]:
-; CHECK-NEXT:    [[C:%.*]] = icmp eq i32 [[IV]], 16
+; CHECK-NEXT:    call void @foo(i32 20)
+; CHECK-NEXT:    [[IV_NEXT1]] = add nuw nsw i32 [[IV]], 1
+; CHECK-NEXT:    [[EC1:%.*]] = icmp eq i32 [[IV_NEXT1]], 16
+; CHECK-NEXT:    br i1 [[EC1]], label %[[EXIT_1_PEEL_BEGIN:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK:       [[EXIT_0_LOOPEXIT]]:
+; CHECK-NEXT:    [[RES_0_PH:%.*]] = phi i32 [ [[IV]], %[[LOOP_HEADER]] ]
+; CHECK-NEXT:    br label %[[EXIT_0:.*]]
+; CHECK:       [[EXIT_0]]:
+; CHECK-NEXT:    [[RES_0:%.*]] = phi i32 [ [[IV_NEXT_LCSSA:%.*]], %[[LOOP_HEADER_PEEL:.*]] ], [ [[RES_0_PH]], %[[EXIT_0_LOOPEXIT]] ]
+; CHECK-NEXT:    ret i32 [[RES_0]]
+; CHECK:       [[EXIT_1_PEEL_BEGIN]]:
+; CHECK-NEXT:    [[IV_NEXT_LCSSA]] = phi i32 [ [[IV_NEXT1]], %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    [[RES_1:%.*]] = phi i32 [ 20, %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    br label %[[LOOP_HEADER_PEEL]]
+; CHECK:       [[LOOP_HEADER_PEEL]]:
+; CHECK-NEXT:    [[EC_0_PEEL:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], [[N]]
+; CHECK-NEXT:    br i1 [[EC_0_PEEL]], label %[[EXIT_0]], label %[[LOOP_LATCH_PEEL:.*]]
+; CHECK:       [[LOOP_LATCH_PEEL]]:
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i32 [[IV_NEXT_LCSSA]], 16
 ; CHECK-NEXT:    [[COND:%.*]] = select i1 [[C]], i32 10, i32 20
 ; CHECK-NEXT:    call void @foo(i32 [[COND]])
-; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i32 [[IV_NEXT_LCSSA]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 17
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT_1:.*]], label %[[LOOP_HEADER]]
-; CHECK:       [[EXIT_0]]:
-; CHECK-NEXT:    [[RES_0:%.*]] = phi i32 [ [[IV]], %[[LOOP_HEADER]] ]
-; CHECK-NEXT:    ret i32 [[RES_0]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT_1_PEEL_NEXT:.*]], label %[[EXIT_1_PEEL_NEXT]]
+; CHECK:       [[EXIT_1_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[LOOP_HEADER_PEEL_NEXT:.*]]
+; CHECK:       [[LOOP_HEADER_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[EXIT_1:.*]]
 ; CHECK:       [[EXIT_1]]:
-; CHECK-NEXT:    [[RES_1:%.*]] = phi i32 [ [[COND]], %[[LOOP_LATCH]] ]
 ; CHECK-NEXT:    ret i32 [[RES_1]]
 ;
 entry:
@@ -230,22 +311,49 @@ define i64 @peel_last_btc_not_computable() {
 ; CHECK-NEXT:  [[ENTRY:.*]]:
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
-; CHECK-NEXT:    [[IV_NEXT_LCSSA:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[IV_NEXT_PEEL:%.*]], %[[LOOP_LATCH:.*]] ]
-; CHECK-NEXT:    [[EC_0_PEEL:%.*]] = call i1 @cond()
-; CHECK-NEXT:    br i1 [[EC_0_PEEL]], label %[[THEN_1:.*]], label %[[EXIT:.*]]
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
+; CHECK-NEXT:    [[EC_0:%.*]] = call i1 @cond()
+; CHECK-NEXT:    br i1 [[EC_0]], label %[[THEN_1:.*]], label %[[EXITSPLIT_LOOPEXIT:.*]]
 ; CHECK:       [[THEN_1]]:
 ; CHECK-NEXT:    call void @foo(i32 1)
-; CHECK-NEXT:    [[C_PEEL:%.*]] = icmp eq i64 [[IV_NEXT_LCSSA]], 7
-; CHECK-NEXT:    br i1 [[C_PEEL]], label %[[LOOP_LATCH]], label %[[THEN_2:.*]]
+; CHECK-NEXT:    br i1 false, label %[[LOOP_LATCH]], label %[[THEN_2:.*]]
 ; CHECK:       [[THEN_2]]:
 ; CHECK-NEXT:    call void @foo(i32 2)
 ; CHECK-NEXT:    br label %[[LOOP_LATCH]]
 ; CHECK:       [[LOOP_LATCH]]:
-; CHECK-NEXT:    [[IV_NEXT_PEEL]] = add i64 [[IV_NEXT_LCSSA]], 1
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[IV_NEXT]], 7
+; CHECK-NEXT:    br i1 [[EXITCOND]], label %[[LOOP_HEADER]], label %[[EXIT_PEEL_BEGIN:.*]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK:       [[EXIT_PEEL_BEGIN]]:
+; CHECK-NEXT:    [[IV_NEXT_LCSSA:%.*]] = phi i64 [ [[IV_NEXT]], %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    [[SPLIT:%.*]] = phi i64 [ 1, %[[LOOP_LATCH]] ]
+; CHECK-NEXT:    br label %[[LOOP_HEADER_PEEL:.*]]
+; CHECK:       [[LOOP_HEADER_PEEL]]:
+; CHECK-NEXT:    [[EC_0_PEEL:%.*]] = call i1 @cond()
+; CHECK-NEXT:    br i1 [[EC_0_PEEL]], label %[[THEN_1_PEEL:.*]], label %[[EXITSPLIT:.*]]
+; CHECK:       [[THEN_1_PEEL]]:
+; CHECK-NEXT:    call void @foo(i32 1)
+; CHECK-NEXT:    [[C_PEEL:%.*]] = icmp eq i64 [[IV_NEXT_LCSSA]], 7
+; CHECK-NEXT:    br i1 [[C_PEEL]], label %[[LOOP_LATCH_PEEL:.*]], label %[[THEN_2_PEEL:.*]]
+; CHECK:       [[THEN_2_PEEL]]:
+; CHECK-NEXT:    call void @foo(i32 2)
+; CHECK-NEXT:    br label %[[LOOP_LATCH_PEEL]]
+; CHECK:       [[LOOP_LATCH_PEEL]]:
+; CHECK-NEXT:    [[IV_NEXT_PEEL:%.*]] = add i64 [[IV_NEXT_LCSSA]], 1
 ; CHECK-NEXT:    [[EXITCOND_PEEL:%.*]] = icmp ne i64 [[IV_NEXT_PEEL]], 8
-; CHECK-NEXT:    br i1 [[EXITCOND_PEEL]], label %[[LOOP_HEADER]], label %[[EXIT]]
+; CHECK-NEXT:    br i1 [[EXITCOND_PEEL]], label %[[EXIT_PEEL_NEXT:.*]], label %[[EXIT_PEEL_NEXT]]
+; CHECK:       [[EXIT_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[LOOP_HEADER_PEEL_NEXT:.*]]
+; CHECK:       [[LOOP_HEADER_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXITSPLIT_LOOPEXIT]]:
+; CHECK-NEXT:    [[RES_PH_PH:%.*]] = phi i64 [ 2, %[[LOOP_HEADER]] ]
+; CHECK-NEXT:    br label %[[EXITSPLIT]]
+; CHECK:       [[EXITSPLIT]]:
+; CHECK-NEXT:    [[RES_PH:%.*]] = phi i64 [ 2, %[[LOOP_HEADER_PEEL]] ], [ [[RES_PH_PH]], %[[EXITSPLIT_LOOPEXIT]] ]
+; CHECK-NEXT:    br label %[[EXIT]]
 ; CHECK:       [[EXIT]]:
-; CHECK-NEXT:    [[RES:%.*]] = phi i64 [ 1, %[[LOOP_LATCH]] ], [ 2, %[[LOOP_HEADER]] ]
+; CHECK-NEXT:    [[RES:%.*]] = phi i64 [ [[SPLIT]], %[[LOOP_HEADER_PEEL_NEXT]] ], [ [[RES_PH]], %[[EXITSPLIT]] ]
 ; CHECK-NEXT:    ret i64 [[RES]]
 ;
 entry:
@@ -313,3 +421,1...
[truncated]

@preames
Copy link
Collaborator

preames commented May 28, 2025

I think you might be approaching this in a non-ideal way. Let me try to expand, this as much me thinking out loud as anything else.

  1. Single exit loops which exit through the latch don't change with this patch.
  2. Single exit loops which exit through non-latch might regress with this change. This is unlikely to matter in practice because we rotate such loops to become latch exiting.
  3. Multiple exit loops with entirely analyzeable exits regress with this change.
  4. Multiple exit loops with an analyzeable latch exit, but unanalzeable non-latch exits improve.
  5. Multiple exit loops with an unanalyzeable latch exit, but other analyzeable exits don't change.

I think the better way to approach this is to identify the set of analyzeable exits which dominate the sole latch (i.e. the backedge), and peel that instead of BTC. This would have the effect of removing all analyzable exits in one go. (By having the dispatch between the exits in the peeled iteration.)

Implementation wise, this is what the SymbolicMaximum exit count kind in SCEV actually computes, but isn't what the documented contract is. I think the existing users would be fine with the revised contract, and may even rely on it today.

I don't know if we can rely on generic folding to recognize that every analyzeable exit count must be greater than the symbolicmax - 1, so we might need to manually fold all the exits after peeling off the last iteration.

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might be approaching this in a non-ideal way. Let me try to expand, this as much me thinking out loud as anything else.

1. Single exit loops which exit through the latch don't change with this patch.

2. Single exit loops which exit through non-latch might regress with this change.  This is unlikely to matter in practice because we rotate such loops to become latch exiting.

3. Multiple exit loops with entirely analyzeable exits regress with this change.

4. Multiple exit loops with an analyzeable latch exit, but unanalzeable non-latch exits improve.

5. Multiple exit loops with an unanalyzeable latch exit, but other analyzeable exits don't change.

I think the better way to approach this is to identify the set of analyzeable exits which dominate the sole latch (i.e. the backedge), and peel that instead of BTC. This would have the effect of removing all analyzable exits in one go. (By having the dispatch between the exits in the peeled iteration.)

Implementation wise, this is what the SymbolicMaximum exit count kind in SCEV actually computes, but isn't what the documented contract is. I think the existing users would be fine with the revised contract, and may even rely on it today.

I don't know if we can rely on generic folding to recognize that every analyzeable exit count must be greater than the symbolicmax - 1, so we might need to manually fold all the exits after peeling off the last iteration.

Thanks for writing this up! This change also needs additional work to make sure LoopInfo is preserved correctly in all cases. It will probably take me a bit more time to update the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants