Skip to content

[LoopPredication] Fix division by zero in case of zero branch weights #66506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions llvm/lib/Transforms/Scalar/LoopPredication.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -967,6 +967,9 @@ bool LoopPredication::isLoopProfitableToPredicate() {
Numerator += Weight;
Denominator += Weight;
}
// If all weights are zero act as if there was no profile data
if (Denominator == 0)
return BranchProbability::getBranchProbability(1, NumSucc);
return BranchProbability::getBranchProbability(Numerator, Denominator);
} else {
assert(LatchBlock != ExitingBlock &&
Expand Down
21 changes: 20 additions & 1 deletion llvm/test/Transforms/LoopPredication/pr66382.ll
Original file line number Diff line number Diff line change
@@ -1,12 +1,31 @@
; XFAIL: *
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we usually land failing tests in tree (unless policies have changed? I'm not doing a ton of reviewing these days...). This makes it a bit harder to comment on them in the review.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen it done before. Regarding assertions - without them the crash would be just division by zero. Otherwise, the buildbots would complain about XFAIL passing.

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
; RUN: opt -S -loop-predication-skip-profitability-checks=false -passes='require<scalar-evolution>,loop-mssa(loop-predication)' %s | FileCheck %s

target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: nocallback nofree nosync willreturn
declare void @llvm.experimental.guard(i1, ...) #0

; Check that LoopPredication doesn't crash on all-zero branch weights
define void @foo() {
; CHECK-LABEL: define void @foo() {
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[HEADER:%.*]]
; CHECK: Header:
; CHECK-NEXT: [[J2:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[J_NEXT:%.*]], [[LATCH:%.*]] ]
; CHECK-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 false, i32 0) [ "deopt"() ]
; CHECK-NEXT: [[J_NEXT]] = add i64 [[J2]], 1
; CHECK-NEXT: br i1 false, label [[LATCH]], label [[EXIT:%.*]]
; CHECK: Latch:
; CHECK-NEXT: [[SPECULATE_TRIP_COUNT:%.*]] = icmp ult i64 [[J2]], 0
; CHECK-NEXT: br i1 [[SPECULATE_TRIP_COUNT]], label [[HEADER]], label [[COMMON_RET_LOOPEXIT:%.*]], !prof [[PROF0:![0-9]+]]
; CHECK: common.ret.loopexit:
; CHECK-NEXT: br label [[COMMON_RET:%.*]]
; CHECK: common.ret:
; CHECK-NEXT: ret void
; CHECK: exit:
; CHECK-NEXT: br label [[COMMON_RET]]
;
Comment on lines +11 to +28
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if this is checking anything relevant about the expected output of the pass, in the context of the interpretation of the branch weights as an even distribution. If it is, can you explain?

If not, is there a way to do that? Maybe you can observe somehow that the branch weights are correctly interpreted as "even" by looking at DEBUG output (the DEBUG_TYPE for this pass is loop-predication), or maybe STATISTIC?

For example, I see that there's a command-line option -loop-predication-latch-probability-scale, which is a scaling factor applied to the latch probability. This affects how the branch weights are used, ultimately changing the return of LoopPredication::isLoopProfitableToPredicate. Can you construct a test case where, if the branch probability is even (the correct interpretation of branch_weights of 0), then two RUN lines with different -loop-predication-latch-probability-scale will give you different output for STATISTIC and/or DEBUG? If so, then we could have see CHECK lines on the STATISTIC/DEBUG output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, it just checks that there are simply no crashes. experimental_guard is required otherwise the LoopPredication pass would not run

// There is nothing to do if the module doesn't use guards
  auto *GuardDecl =
      M->getFunction(Intrinsic::getName(Intrinsic::experimental_guard));
  bool HasIntrinsicGuards = GuardDecl && !GuardDecl->use_empty();
  auto *WCDecl = M->getFunction(
      Intrinsic::getName(Intrinsic::experimental_widenable_condition));
  bool HasWidenableConditions =
      PredicateWidenableBranchGuards && WCDecl && !WCDecl->use_empty();
  if (!HasIntrinsicGuards && !HasWidenableConditions)
    return false;

The test was reduced with bugpoint and llvm-reduce, so I don't know if it can be meaningfully reduced further (also why it doesn't really do much if there is no crash).

Trying to make it output something more meaningful would likely make the test bigger (and would require assertions).

entry:
br label %Header

Expand Down
259 changes: 259 additions & 0 deletions llvm/test/Transforms/LoopPredication/scale.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
; RUN: opt -S -loop-predication-skip-profitability-checks=false -passes='require<scalar-evolution>,loop-mssa(loop-predication)' -verify-memoryssa -loop-predication-latch-probability-scale=2 %s 2>&1 | FileCheck %s --check-prefixes=CHECK-PROF
; RUN: opt -S -loop-predication-skip-profitability-checks=false -passes='require<scalar-evolution>,loop-mssa(loop-predication)' -verify-memoryssa -loop-predication-latch-probability-scale=1.9 %s 2>&1 | FileCheck %s --check-prefixes=CHECK-NOTPROF

; LatchExitProbability: 0x20000000 / 0x80000000 = 25.00%
; ExitingBlockProbability: 0x40000000 / 0x80000000 = 50.00%
; Predicate is profitable when the scale factor is 2 and not profitable if it's less than 2.
define i64 @predicate_eq_ones(ptr nocapture readonly %arg, i32 %length, ptr nocapture readonly %arg2, ptr nocapture readonly %n_addr, i64 %i) !prof !21 {
; CHECK-PROF-LABEL: define i64 @predicate_eq_ones(
; CHECK-PROF-SAME: ptr nocapture readonly [[ARG:%.*]], i32 [[LENGTH:%.*]], ptr nocapture readonly [[ARG2:%.*]], ptr nocapture readonly [[N_ADDR:%.*]], i64 [[I:%.*]]) !prof [[PROF0:![0-9]+]] {
; CHECK-PROF-NEXT: entry:
; CHECK-PROF-NEXT: [[LENGTH_EXT:%.*]] = zext i32 [[LENGTH]] to i64
; CHECK-PROF-NEXT: [[N_PRE:%.*]] = load i64, ptr [[N_ADDR]], align 4
; CHECK-PROF-NEXT: [[TMP0:%.*]] = icmp ule i64 1048576, [[LENGTH_EXT]]
; CHECK-PROF-NEXT: [[TMP1:%.*]] = icmp ult i64 0, [[LENGTH_EXT]]
; CHECK-PROF-NEXT: [[TMP2:%.*]] = and i1 [[TMP1]], [[TMP0]]
; CHECK-PROF-NEXT: [[TMP3:%.*]] = freeze i1 [[TMP2]]
; CHECK-PROF-NEXT: br label [[HEADER:%.*]]
; CHECK-PROF: Header:
; CHECK-PROF-NEXT: [[RESULT_IN3:%.*]] = phi ptr [ [[ARG2]], [[ENTRY:%.*]] ], [ [[ARG]], [[LATCH:%.*]] ]
; CHECK-PROF-NEXT: [[J2:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[J_NEXT:%.*]], [[LATCH]] ]
; CHECK-PROF-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i64 [[J2]], [[LENGTH_EXT]]
; CHECK-PROF-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 [[TMP3]], i32 9) [ "deopt"() ]
; CHECK-PROF-NEXT: call void @llvm.assume(i1 [[WITHIN_BOUNDS]])
; CHECK-PROF-NEXT: [[INNERCMP:%.*]] = icmp eq i64 [[J2]], [[N_PRE]]
; CHECK-PROF-NEXT: [[J_NEXT]] = add nuw nsw i64 [[J2]], 1
; CHECK-PROF-NEXT: br i1 [[INNERCMP]], label [[LATCH]], label [[EXIT:%.*]], !prof [[PROF1:![0-9]+]]
; CHECK-PROF: Latch:
; CHECK-PROF-NEXT: [[SPECULATE_TRIP_COUNT:%.*]] = icmp ult i64 [[J_NEXT]], 1048576
; CHECK-PROF-NEXT: br i1 [[SPECULATE_TRIP_COUNT]], label [[HEADER]], label [[EXITLATCH:%.*]], !prof [[PROF2:![0-9]+]]
; CHECK-PROF: exitLatch:
; CHECK-PROF-NEXT: ret i64 1
; CHECK-PROF: exit:
; CHECK-PROF-NEXT: [[RESULT_IN3_LCSSA:%.*]] = phi ptr [ [[RESULT_IN3]], [[HEADER]] ]
; CHECK-PROF-NEXT: [[RESULT_LE:%.*]] = load i64, ptr [[RESULT_IN3_LCSSA]], align 8
; CHECK-PROF-NEXT: ret i64 [[RESULT_LE]]
;
; CHECK-NOTPROF-LABEL: define i64 @predicate_eq_ones(
; CHECK-NOTPROF-SAME: ptr nocapture readonly [[ARG:%.*]], i32 [[LENGTH:%.*]], ptr nocapture readonly [[ARG2:%.*]], ptr nocapture readonly [[N_ADDR:%.*]], i64 [[I:%.*]]) !prof [[PROF0:![0-9]+]] {
; CHECK-NOTPROF-NEXT: entry:
; CHECK-NOTPROF-NEXT: [[LENGTH_EXT:%.*]] = zext i32 [[LENGTH]] to i64
; CHECK-NOTPROF-NEXT: [[N_PRE:%.*]] = load i64, ptr [[N_ADDR]], align 4
; CHECK-NOTPROF-NEXT: br label [[HEADER:%.*]]
; CHECK-NOTPROF: Header:
; CHECK-NOTPROF-NEXT: [[RESULT_IN3:%.*]] = phi ptr [ [[ARG2]], [[ENTRY:%.*]] ], [ [[ARG]], [[LATCH:%.*]] ]
; CHECK-NOTPROF-NEXT: [[J2:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[J_NEXT:%.*]], [[LATCH]] ]
; CHECK-NOTPROF-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i64 [[J2]], [[LENGTH_EXT]]
; CHECK-NOTPROF-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 [[WITHIN_BOUNDS]], i32 9) [ "deopt"() ]
; CHECK-NOTPROF-NEXT: [[INNERCMP:%.*]] = icmp eq i64 [[J2]], [[N_PRE]]
; CHECK-NOTPROF-NEXT: [[J_NEXT]] = add nuw nsw i64 [[J2]], 1
; CHECK-NOTPROF-NEXT: br i1 [[INNERCMP]], label [[LATCH]], label [[EXIT:%.*]], !prof [[PROF1:![0-9]+]]
; CHECK-NOTPROF: Latch:
; CHECK-NOTPROF-NEXT: [[SPECULATE_TRIP_COUNT:%.*]] = icmp ult i64 [[J_NEXT]], 1048576
; CHECK-NOTPROF-NEXT: br i1 [[SPECULATE_TRIP_COUNT]], label [[HEADER]], label [[EXITLATCH:%.*]], !prof [[PROF2:![0-9]+]]
; CHECK-NOTPROF: exitLatch:
; CHECK-NOTPROF-NEXT: ret i64 1
; CHECK-NOTPROF: exit:
; CHECK-NOTPROF-NEXT: [[RESULT_IN3_LCSSA:%.*]] = phi ptr [ [[RESULT_IN3]], [[HEADER]] ]
; CHECK-NOTPROF-NEXT: [[RESULT_LE:%.*]] = load i64, ptr [[RESULT_IN3_LCSSA]], align 8
; CHECK-NOTPROF-NEXT: ret i64 [[RESULT_LE]]
;
entry:
%length.ext = zext i32 %length to i64
%n.pre = load i64, ptr %n_addr, align 4
br label %Header

Header: ; preds = %entry, %Latch
%result.in3 = phi ptr [ %arg2, %entry ], [ %arg, %Latch ]
%j2 = phi i64 [ 0, %entry ], [ %j.next, %Latch ]
%within.bounds = icmp ult i64 %j2, %length.ext
call void (i1, ...) @llvm.experimental.guard(i1 %within.bounds, i32 9) [ "deopt"() ]
%innercmp = icmp eq i64 %j2, %n.pre
%j.next = add nuw nsw i64 %j2, 1
br i1 %innercmp, label %Latch, label %exit, !prof !0

Latch: ; preds = %Header
%speculate_trip_count = icmp ult i64 %j.next, 1048576
br i1 %speculate_trip_count, label %Header, label %exitLatch, !prof !2

exitLatch: ; preds = %Latch
ret i64 1

exit: ; preds = %Header
%result.in3.lcssa = phi ptr [ %result.in3, %Header ]
%result.le = load i64, ptr %result.in3.lcssa, align 8
ret i64 %result.le
}
!0 = !{!"branch_weights", i32 1, i32 1}

; Same as the previous one, but with zero weights (should be treated as if no profile - equal probability)
define i64 @predicate_eq_zeroes(ptr nocapture readonly %arg, i32 %length, ptr nocapture readonly %arg2, ptr nocapture readonly %n_addr, i64 %i) !prof !21 {
; CHECK-PROF-LABEL: define i64 @predicate_eq_zeroes(
; CHECK-PROF-SAME: ptr nocapture readonly [[ARG:%.*]], i32 [[LENGTH:%.*]], ptr nocapture readonly [[ARG2:%.*]], ptr nocapture readonly [[N_ADDR:%.*]], i64 [[I:%.*]]) !prof [[PROF0]] {
; CHECK-PROF-NEXT: entry:
; CHECK-PROF-NEXT: [[LENGTH_EXT:%.*]] = zext i32 [[LENGTH]] to i64
; CHECK-PROF-NEXT: [[N_PRE:%.*]] = load i64, ptr [[N_ADDR]], align 4
; CHECK-PROF-NEXT: [[TMP0:%.*]] = icmp ule i64 1048576, [[LENGTH_EXT]]
; CHECK-PROF-NEXT: [[TMP1:%.*]] = icmp ult i64 0, [[LENGTH_EXT]]
; CHECK-PROF-NEXT: [[TMP2:%.*]] = and i1 [[TMP1]], [[TMP0]]
; CHECK-PROF-NEXT: [[TMP3:%.*]] = freeze i1 [[TMP2]]
; CHECK-PROF-NEXT: br label [[HEADER:%.*]]
; CHECK-PROF: Header:
; CHECK-PROF-NEXT: [[RESULT_IN3:%.*]] = phi ptr [ [[ARG2]], [[ENTRY:%.*]] ], [ [[ARG]], [[LATCH:%.*]] ]
; CHECK-PROF-NEXT: [[J2:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[J_NEXT:%.*]], [[LATCH]] ]
; CHECK-PROF-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i64 [[J2]], [[LENGTH_EXT]]
; CHECK-PROF-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 [[TMP3]], i32 9) [ "deopt"() ]
; CHECK-PROF-NEXT: call void @llvm.assume(i1 [[WITHIN_BOUNDS]])
; CHECK-PROF-NEXT: [[INNERCMP:%.*]] = icmp eq i64 [[J2]], [[N_PRE]]
; CHECK-PROF-NEXT: [[J_NEXT]] = add nuw nsw i64 [[J2]], 1
; CHECK-PROF-NEXT: br i1 [[INNERCMP]], label [[LATCH]], label [[EXIT:%.*]], !prof [[PROF3:![0-9]+]]
; CHECK-PROF: Latch:
; CHECK-PROF-NEXT: [[SPECULATE_TRIP_COUNT:%.*]] = icmp ult i64 [[J_NEXT]], 1048576
; CHECK-PROF-NEXT: br i1 [[SPECULATE_TRIP_COUNT]], label [[HEADER]], label [[EXITLATCH:%.*]], !prof [[PROF2]]
; CHECK-PROF: exitLatch:
; CHECK-PROF-NEXT: ret i64 1
; CHECK-PROF: exit:
; CHECK-PROF-NEXT: [[RESULT_IN3_LCSSA:%.*]] = phi ptr [ [[RESULT_IN3]], [[HEADER]] ]
; CHECK-PROF-NEXT: [[RESULT_LE:%.*]] = load i64, ptr [[RESULT_IN3_LCSSA]], align 8
; CHECK-PROF-NEXT: ret i64 [[RESULT_LE]]
;
; CHECK-NOTPROF-LABEL: define i64 @predicate_eq_zeroes(
; CHECK-NOTPROF-SAME: ptr nocapture readonly [[ARG:%.*]], i32 [[LENGTH:%.*]], ptr nocapture readonly [[ARG2:%.*]], ptr nocapture readonly [[N_ADDR:%.*]], i64 [[I:%.*]]) !prof [[PROF0]] {
; CHECK-NOTPROF-NEXT: entry:
; CHECK-NOTPROF-NEXT: [[LENGTH_EXT:%.*]] = zext i32 [[LENGTH]] to i64
; CHECK-NOTPROF-NEXT: [[N_PRE:%.*]] = load i64, ptr [[N_ADDR]], align 4
; CHECK-NOTPROF-NEXT: br label [[HEADER:%.*]]
; CHECK-NOTPROF: Header:
; CHECK-NOTPROF-NEXT: [[RESULT_IN3:%.*]] = phi ptr [ [[ARG2]], [[ENTRY:%.*]] ], [ [[ARG]], [[LATCH:%.*]] ]
; CHECK-NOTPROF-NEXT: [[J2:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[J_NEXT:%.*]], [[LATCH]] ]
; CHECK-NOTPROF-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i64 [[J2]], [[LENGTH_EXT]]
; CHECK-NOTPROF-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 [[WITHIN_BOUNDS]], i32 9) [ "deopt"() ]
; CHECK-NOTPROF-NEXT: [[INNERCMP:%.*]] = icmp eq i64 [[J2]], [[N_PRE]]
; CHECK-NOTPROF-NEXT: [[J_NEXT]] = add nuw nsw i64 [[J2]], 1
; CHECK-NOTPROF-NEXT: br i1 [[INNERCMP]], label [[LATCH]], label [[EXIT:%.*]], !prof [[PROF3:![0-9]+]]
; CHECK-NOTPROF: Latch:
; CHECK-NOTPROF-NEXT: [[SPECULATE_TRIP_COUNT:%.*]] = icmp ult i64 [[J_NEXT]], 1048576
; CHECK-NOTPROF-NEXT: br i1 [[SPECULATE_TRIP_COUNT]], label [[HEADER]], label [[EXITLATCH:%.*]], !prof [[PROF2]]
; CHECK-NOTPROF: exitLatch:
; CHECK-NOTPROF-NEXT: ret i64 1
; CHECK-NOTPROF: exit:
; CHECK-NOTPROF-NEXT: [[RESULT_IN3_LCSSA:%.*]] = phi ptr [ [[RESULT_IN3]], [[HEADER]] ]
; CHECK-NOTPROF-NEXT: [[RESULT_LE:%.*]] = load i64, ptr [[RESULT_IN3_LCSSA]], align 8
; CHECK-NOTPROF-NEXT: ret i64 [[RESULT_LE]]
;
entry:
%length.ext = zext i32 %length to i64
%n.pre = load i64, ptr %n_addr, align 4
br label %Header

Header: ; preds = %entry, %Latch
%result.in3 = phi ptr [ %arg2, %entry ], [ %arg, %Latch ]
%j2 = phi i64 [ 0, %entry ], [ %j.next, %Latch ]
%within.bounds = icmp ult i64 %j2, %length.ext
call void (i1, ...) @llvm.experimental.guard(i1 %within.bounds, i32 9) [ "deopt"() ]
%innercmp = icmp eq i64 %j2, %n.pre
%j.next = add nuw nsw i64 %j2, 1
br i1 %innercmp, label %Latch, label %exit, !prof !1

Latch: ; preds = %Header
%speculate_trip_count = icmp ult i64 %j.next, 1048576
br i1 %speculate_trip_count, label %Header, label %exitLatch, !prof !2

exitLatch: ; preds = %Latch
ret i64 1

exit: ; preds = %Header
%result.in3.lcssa = phi ptr [ %result.in3, %Header ]
%result.le = load i64, ptr %result.in3.lcssa, align 8
ret i64 %result.le
}
!1 = !{!"branch_weights", i32 0, i32 0}

; No profile on br in Header
define i64 @predicate_eq_none(ptr nocapture readonly %arg, i32 %length, ptr nocapture readonly %arg2, ptr nocapture readonly %n_addr, i64 %i) !prof !21 {
; CHECK-PROF-LABEL: define i64 @predicate_eq_none(
; CHECK-PROF-SAME: ptr nocapture readonly [[ARG:%.*]], i32 [[LENGTH:%.*]], ptr nocapture readonly [[ARG2:%.*]], ptr nocapture readonly [[N_ADDR:%.*]], i64 [[I:%.*]]) !prof [[PROF0]] {
; CHECK-PROF-NEXT: entry:
; CHECK-PROF-NEXT: [[LENGTH_EXT:%.*]] = zext i32 [[LENGTH]] to i64
; CHECK-PROF-NEXT: [[N_PRE:%.*]] = load i64, ptr [[N_ADDR]], align 4
; CHECK-PROF-NEXT: [[TMP0:%.*]] = icmp ule i64 1048576, [[LENGTH_EXT]]
; CHECK-PROF-NEXT: [[TMP1:%.*]] = icmp ult i64 0, [[LENGTH_EXT]]
; CHECK-PROF-NEXT: [[TMP2:%.*]] = and i1 [[TMP1]], [[TMP0]]
; CHECK-PROF-NEXT: [[TMP3:%.*]] = freeze i1 [[TMP2]]
; CHECK-PROF-NEXT: br label [[HEADER:%.*]]
; CHECK-PROF: Header:
; CHECK-PROF-NEXT: [[RESULT_IN3:%.*]] = phi ptr [ [[ARG2]], [[ENTRY:%.*]] ], [ [[ARG]], [[LATCH:%.*]] ]
; CHECK-PROF-NEXT: [[J2:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[J_NEXT:%.*]], [[LATCH]] ]
; CHECK-PROF-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i64 [[J2]], [[LENGTH_EXT]]
; CHECK-PROF-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 [[TMP3]], i32 9) [ "deopt"() ]
; CHECK-PROF-NEXT: call void @llvm.assume(i1 [[WITHIN_BOUNDS]])
; CHECK-PROF-NEXT: [[INNERCMP:%.*]] = icmp eq i64 [[J2]], [[N_PRE]]
; CHECK-PROF-NEXT: [[J_NEXT]] = add nuw nsw i64 [[J2]], 1
; CHECK-PROF-NEXT: br i1 [[INNERCMP]], label [[LATCH]], label [[EXIT:%.*]]
; CHECK-PROF: Latch:
; CHECK-PROF-NEXT: [[SPECULATE_TRIP_COUNT:%.*]] = icmp ult i64 [[J_NEXT]], 1048576
; CHECK-PROF-NEXT: br i1 [[SPECULATE_TRIP_COUNT]], label [[HEADER]], label [[EXITLATCH:%.*]], !prof [[PROF2]]
; CHECK-PROF: exitLatch:
; CHECK-PROF-NEXT: ret i64 1
; CHECK-PROF: exit:
; CHECK-PROF-NEXT: [[RESULT_IN3_LCSSA:%.*]] = phi ptr [ [[RESULT_IN3]], [[HEADER]] ]
; CHECK-PROF-NEXT: [[RESULT_LE:%.*]] = load i64, ptr [[RESULT_IN3_LCSSA]], align 8
; CHECK-PROF-NEXT: ret i64 [[RESULT_LE]]
;
; CHECK-NOTPROF-LABEL: define i64 @predicate_eq_none(
; CHECK-NOTPROF-SAME: ptr nocapture readonly [[ARG:%.*]], i32 [[LENGTH:%.*]], ptr nocapture readonly [[ARG2:%.*]], ptr nocapture readonly [[N_ADDR:%.*]], i64 [[I:%.*]]) !prof [[PROF0]] {
; CHECK-NOTPROF-NEXT: entry:
; CHECK-NOTPROF-NEXT: [[LENGTH_EXT:%.*]] = zext i32 [[LENGTH]] to i64
; CHECK-NOTPROF-NEXT: [[N_PRE:%.*]] = load i64, ptr [[N_ADDR]], align 4
; CHECK-NOTPROF-NEXT: br label [[HEADER:%.*]]
; CHECK-NOTPROF: Header:
; CHECK-NOTPROF-NEXT: [[RESULT_IN3:%.*]] = phi ptr [ [[ARG2]], [[ENTRY:%.*]] ], [ [[ARG]], [[LATCH:%.*]] ]
; CHECK-NOTPROF-NEXT: [[J2:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[J_NEXT:%.*]], [[LATCH]] ]
; CHECK-NOTPROF-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i64 [[J2]], [[LENGTH_EXT]]
; CHECK-NOTPROF-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 [[WITHIN_BOUNDS]], i32 9) [ "deopt"() ]
; CHECK-NOTPROF-NEXT: [[INNERCMP:%.*]] = icmp eq i64 [[J2]], [[N_PRE]]
; CHECK-NOTPROF-NEXT: [[J_NEXT]] = add nuw nsw i64 [[J2]], 1
; CHECK-NOTPROF-NEXT: br i1 [[INNERCMP]], label [[LATCH]], label [[EXIT:%.*]]
; CHECK-NOTPROF: Latch:
; CHECK-NOTPROF-NEXT: [[SPECULATE_TRIP_COUNT:%.*]] = icmp ult i64 [[J_NEXT]], 1048576
; CHECK-NOTPROF-NEXT: br i1 [[SPECULATE_TRIP_COUNT]], label [[HEADER]], label [[EXITLATCH:%.*]], !prof [[PROF2]]
; CHECK-NOTPROF: exitLatch:
; CHECK-NOTPROF-NEXT: ret i64 1
; CHECK-NOTPROF: exit:
; CHECK-NOTPROF-NEXT: [[RESULT_IN3_LCSSA:%.*]] = phi ptr [ [[RESULT_IN3]], [[HEADER]] ]
; CHECK-NOTPROF-NEXT: [[RESULT_LE:%.*]] = load i64, ptr [[RESULT_IN3_LCSSA]], align 8
; CHECK-NOTPROF-NEXT: ret i64 [[RESULT_LE]]
;
entry:
%length.ext = zext i32 %length to i64
%n.pre = load i64, ptr %n_addr, align 4
br label %Header

Header: ; preds = %entry, %Latch
%result.in3 = phi ptr [ %arg2, %entry ], [ %arg, %Latch ]
%j2 = phi i64 [ 0, %entry ], [ %j.next, %Latch ]
%within.bounds = icmp ult i64 %j2, %length.ext
call void (i1, ...) @llvm.experimental.guard(i1 %within.bounds, i32 9) [ "deopt"() ]
%innercmp = icmp eq i64 %j2, %n.pre
%j.next = add nuw nsw i64 %j2, 1
br i1 %innercmp, label %Latch, label %exit

Latch: ; preds = %Header
%speculate_trip_count = icmp ult i64 %j.next, 1048576
br i1 %speculate_trip_count, label %Header, label %exitLatch, !prof !2

exitLatch: ; preds = %Latch
ret i64 1

exit: ; preds = %Header
%result.in3.lcssa = phi ptr [ %result.in3, %Header ]
%result.le = load i64, ptr %result.in3.lcssa, align 8
ret i64 %result.le
}

!2 = !{!"branch_weights", i32 3, i32 1}
!21 = !{!"function_entry_count", i64 20000}

declare i64 @llvm.experimental.deoptimize.i64(...)
declare void @llvm.experimental.guard(i1, ...)