-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[LAA] Keep pointer checks on partial analysis #139719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LAA] Keep pointer checks on partial analysis #139719
Conversation
Currently if there's any memory access that AccessAnalysis couldn't analyze then all of the runtime pointer check results are discarded. This patch changes that so they're not, and we generate the runtime check information for those pointers that we could analyze, as transformations may still be able to make use of the partial information. Of the transformations that use LoopAccessAnalysis, only LoopVersioningLICM changes behaviour as a result of this change. This is because the others either: * Check canVectorizeMemory, which will return false when we have partial pointer information as analyzeLoop() will return false. * Examine the dependencies returned by getDepChecker(), which will be empty as we exit analyzeLoop if we have partial pointer information before calling areDepsSafe(), which is what fills in the dependency information.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: John Brawn (john-brawn-arm) ChangesCurrently if there's any memory access that AccessAnalysis couldn't analyze then all of the runtime pointer check results are discarded. This patch changes that so they're not, and we generate the runtime check information for those pointers that we could analyze, as transformations may still be able to make use of the partial information. Of the transformations that use LoopAccessAnalysis, only LoopVersioningLICM changes behaviour as a result of this change. This is because the others either:
Patch is 64.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139719.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index f715e0ec8dbb4..0abd7f1182108 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -192,7 +192,8 @@ class MemoryDepChecker {
/// of a write access.
void addAccess(LoadInst *LI);
- /// Check whether the dependencies between the accesses are safe.
+ /// Check whether the dependencies between the accesses are safe, and records
+ /// the dependence information in Dependences if so.
///
/// Only checks sets with elements in \p CheckDeps.
bool areDepsSafe(const DepCandidates &AccessSets,
@@ -779,10 +780,14 @@ class LoopAccessInfo {
/// We need to check that all of the pointers in this list are disjoint
/// at runtime. Using std::unique_ptr to make using move ctor simpler.
+ /// This list may contain only partial information when we've failed to
+ /// analyze all the memory accesses in the loop (i.e. CanVecMem is false).
std::unique_ptr<RuntimePointerChecking> PtrRtChecking;
- /// the Memory Dependence Checker which can determine the
+ /// The Memory Dependence Checker which can determine the
/// loop-independent and loop-carried dependences between memory accesses.
+ /// This will be empty if we've failed to analyze all the memory access in the
+ /// loop (i.e. CanVecMem is false).
std::unique_ptr<MemoryDepChecker> DepChecker;
Loop *TheLoop;
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index af1a3c593c514..4d912abdf59a8 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -529,8 +529,10 @@ void RuntimePointerChecking::groupChecks(
// equivalence class, the iteration order is deterministic.
for (auto M : DepCands.members(Access)) {
auto PointerI = PositionMap.find(M.getPointer());
- assert(PointerI != PositionMap.end() &&
- "pointer in equivalence class not found in PositionMap");
+ // If we can't find the pointer in PositionMap that means we can't
+ // generate a memcheck for it.
+ if (PointerI == PositionMap.end())
+ continue;
for (unsigned Pointer : PointerI->second) {
bool Merged = false;
// Mark this pointer as seen.
@@ -682,7 +684,9 @@ class AccessAnalysis {
/// non-intersection.
///
/// Returns true if we need no check or if we do and we can generate them
- /// (i.e. the pointers have computable bounds).
+ /// (i.e. the pointers have computable bounds). A return value of false means
+ /// we couldn't analyze and generate runtime checks for all pointers in the
+ /// loop, but we will have checks for those pointers we could analyze.
bool canCheckPtrAtRT(RuntimePointerChecking &RtCheck, ScalarEvolution *SE,
Loop *TheLoop,
const DenseMap<Value *, const SCEV *> &Strides,
@@ -1273,7 +1277,6 @@ bool AccessAnalysis::canCheckPtrAtRT(
/*Assume=*/true)) {
CanDoAliasSetRT = false;
UncomputablePtr = Access.getPointer();
- break;
}
}
}
@@ -1313,7 +1316,7 @@ bool AccessAnalysis::canCheckPtrAtRT(
}
}
- if (MayNeedRTCheck && CanDoRT)
+ if (MayNeedRTCheck)
RtCheck.generateChecks(DepCands, IsDepCheckNeeded);
LLVM_DEBUG(dbgs() << "LAA: We need to do " << RtCheck.getNumberOfChecks()
@@ -1323,11 +1326,7 @@ bool AccessAnalysis::canCheckPtrAtRT(
// are needed. This can happen when all pointers point to the same underlying
// object for example.
RtCheck.Need = CanDoRT ? RtCheck.getNumberOfChecks() != 0 : MayNeedRTCheck;
-
- bool CanDoRTIfNeeded = !RtCheck.Need || CanDoRT;
- if (!CanDoRTIfNeeded)
- RtCheck.reset();
- return CanDoRTIfNeeded;
+ return !RtCheck.Need || CanDoRT;
}
void AccessAnalysis::processMemAccesses() {
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll b/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
index cd388b4ee87f2..f0885cf0a6043 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
@@ -633,7 +633,18 @@ define dso_local void @forked_ptrs_same_base_different_offset(ptr nocapture read
; CHECK-NEXT: Report: cannot identify array bounds
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP47:0x[0-9a-f]+]]):
+; CHECK-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; CHECK-NEXT: Against group ([[GRP48:0x[0-9a-f]+]]):
+; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP47]]:
+; CHECK-NEXT: (Low: %Dest High: (400 + %Dest))
+; CHECK-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
+; CHECK-NEXT: Group [[GRP48]]:
+; CHECK-NEXT: (Low: %Preds High: (400 + %Preds))
+; CHECK-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
@@ -645,7 +656,18 @@ define dso_local void @forked_ptrs_same_base_different_offset(ptr nocapture read
; RECURSE-NEXT: Report: cannot identify array bounds
; RECURSE-NEXT: Dependences:
; RECURSE-NEXT: Run-time memory checks:
+; RECURSE-NEXT: Check 0:
+; RECURSE-NEXT: Comparing group ([[GRP49:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT: Against group ([[GRP50:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; RECURSE-NEXT: Grouped accesses:
+; RECURSE-NEXT: Group [[GRP49]]:
+; RECURSE-NEXT: (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT: Group [[GRP50]]:
+; RECURSE-NEXT: (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
; RECURSE-EMPTY:
; RECURSE-NEXT: Non vectorizable stores to invariant address were not found in loop.
; RECURSE-NEXT: SCEV assumptions:
@@ -684,24 +706,24 @@ define dso_local void @forked_ptrs_add_to_offset(ptr nocapture readonly %Base, p
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Check 0:
-; CHECK-NEXT: Comparing group ([[GRP47:0x[0-9a-f]+]]):
+; CHECK-NEXT: Comparing group ([[GRP51:0x[0-9a-f]+]]):
; CHECK-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT: Against group ([[GRP48:0x[0-9a-f]+]]):
+; CHECK-NEXT: Against group ([[GRP52:0x[0-9a-f]+]]):
; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; CHECK-NEXT: Check 1:
-; CHECK-NEXT: Comparing group ([[GRP47]]):
+; CHECK-NEXT: Comparing group ([[GRP51]]):
; CHECK-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT: Against group ([[GRP49:0x[0-9a-f]+]]):
+; CHECK-NEXT: Against group ([[GRP53:0x[0-9a-f]+]]):
; CHECK-NEXT: %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
; CHECK-NEXT: %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
; CHECK-NEXT: Grouped accesses:
-; CHECK-NEXT: Group [[GRP47]]:
+; CHECK-NEXT: Group [[GRP51]]:
; CHECK-NEXT: (Low: %Dest High: (400 + %Dest))
; CHECK-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
-; CHECK-NEXT: Group [[GRP48]]:
+; CHECK-NEXT: Group [[GRP52]]:
; CHECK-NEXT: (Low: %Preds High: (400 + %Preds))
; CHECK-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
-; CHECK-NEXT: Group [[GRP49]]:
+; CHECK-NEXT: Group [[GRP53]]:
; CHECK-NEXT: (Low: ((4 * %extra_offset) + %Base) High: (404 + (4 * %extra_offset) + %Base))
; CHECK-NEXT: Member: {(4 + (4 * %extra_offset) + %Base),+,4}<%for.body>
; CHECK-NEXT: Member: {((4 * %extra_offset) + %Base),+,4}<%for.body>
@@ -716,7 +738,18 @@ define dso_local void @forked_ptrs_add_to_offset(ptr nocapture readonly %Base, p
; RECURSE-NEXT: Report: cannot identify array bounds
; RECURSE-NEXT: Dependences:
; RECURSE-NEXT: Run-time memory checks:
+; RECURSE-NEXT: Check 0:
+; RECURSE-NEXT: Comparing group ([[GRP54:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT: Against group ([[GRP55:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; RECURSE-NEXT: Grouped accesses:
+; RECURSE-NEXT: Group [[GRP54]]:
+; RECURSE-NEXT: (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT: Group [[GRP55]]:
+; RECURSE-NEXT: (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
; RECURSE-EMPTY:
; RECURSE-NEXT: Non vectorizable stores to invariant address were not found in loop.
; RECURSE-NEXT: SCEV assumptions:
@@ -752,24 +785,24 @@ define dso_local void @forked_ptrs_sub_from_offset(ptr nocapture readonly %Base,
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Check 0:
-; CHECK-NEXT: Comparing group ([[GRP50:0x[0-9a-f]+]]):
+; CHECK-NEXT: Comparing group ([[GRP56:0x[0-9a-f]+]]):
; CHECK-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT: Against group ([[GRP51:0x[0-9a-f]+]]):
+; CHECK-NEXT: Against group ([[GRP57:0x[0-9a-f]+]]):
; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; CHECK-NEXT: Check 1:
-; CHECK-NEXT: Comparing group ([[GRP50]]):
+; CHECK-NEXT: Comparing group ([[GRP56]]):
; CHECK-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT: Against group ([[GRP52:0x[0-9a-f]+]]):
+; CHECK-NEXT: Against group ([[GRP58:0x[0-9a-f]+]]):
; CHECK-NEXT: %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
; CHECK-NEXT: %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
; CHECK-NEXT: Grouped accesses:
-; CHECK-NEXT: Group [[GRP50]]:
+; CHECK-NEXT: Group [[GRP56]]:
; CHECK-NEXT: (Low: %Dest High: (400 + %Dest))
; CHECK-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
-; CHECK-NEXT: Group [[GRP51]]:
+; CHECK-NEXT: Group [[GRP57]]:
; CHECK-NEXT: (Low: %Preds High: (400 + %Preds))
; CHECK-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
-; CHECK-NEXT: Group [[GRP52]]:
+; CHECK-NEXT: Group [[GRP58]]:
; CHECK-NEXT: (Low: ((-4 * %extra_offset) + %Base) High: (404 + (-4 * %extra_offset) + %Base))
; CHECK-NEXT: Member: {(4 + (-4 * %extra_offset) + %Base),+,4}<%for.body>
; CHECK-NEXT: Member: {((-4 * %extra_offset) + %Base),+,4}<%for.body>
@@ -784,7 +817,18 @@ define dso_local void @forked_ptrs_sub_from_offset(ptr nocapture readonly %Base,
; RECURSE-NEXT: Report: cannot identify array bounds
; RECURSE-NEXT: Dependences:
; RECURSE-NEXT: Run-time memory checks:
+; RECURSE-NEXT: Check 0:
+; RECURSE-NEXT: Comparing group ([[GRP59:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT: Against group ([[GRP60:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; RECURSE-NEXT: Grouped accesses:
+; RECURSE-NEXT: Group [[GRP59]]:
+; RECURSE-NEXT: (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT: Group [[GRP60]]:
+; RECURSE-NEXT: (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
; RECURSE-EMPTY:
; RECURSE-NEXT: Non vectorizable stores to invariant address were not found in loop.
; RECURSE-NEXT: SCEV assumptions:
@@ -820,24 +864,24 @@ define dso_local void @forked_ptrs_add_sub_offset(ptr nocapture readonly %Base,
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Check 0:
-; CHECK-NEXT: Comparing group ([[GRP53:0x[0-9a-f]+]]):
+; CHECK-NEXT: Comparing group ([[GRP61:0x[0-9a-f]+]]):
; CHECK-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT: Against group ([[GRP54:0x[0-9a-f]+]]):
+; CHECK-NEXT: Against group ([[GRP62:0x[0-9a-f]+]]):
; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; CHECK-NEXT: Check 1:
-; CHECK-NEXT: Comparing group ([[GRP53]]):
+; CHECK-NEXT: Comparing group ([[GRP61]]):
; CHECK-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT: Against group ([[GRP55:0x[0-9a-f]+]]):
+; CHECK-NEXT: Against group ([[GRP63:0x[0-9a-f]+]]):
; CHECK-NEXT: %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
; CHECK-NEXT: %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
; CHECK-NEXT: Grouped accesses:
-; CHECK-NEXT: Group [[GRP53]]:
+; CHECK-NEXT: Group [[GRP61]]:
; CHECK-NEXT: (Low: %Dest High: (400 + %Dest))
; CHECK-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
-; CHECK-NEXT: Group [[GRP54]]:
+; CHECK-NEXT: Group [[GRP62]]:
; CHECK-NEXT: (Low: %Preds High: (400 + %Preds))
; CHECK-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
-; CHECK-NEXT: Group [[GRP55]]:
+; CHECK-NEXT: Group [[GRP63]]:
; CHECK-NEXT: (Low: ((4 * %to_add) + (-4 * %to_sub) + %Base) High: (404 + (4 * %to_add) + (-4 * %to_sub) + %Base))
; CHECK-NEXT: Member: {(4 + (4 * %to_add) + (-4 * %to_sub) + %Base),+,4}<%for.body>
; CHECK-NEXT: Member: {((4 * %to_add) + (-4 * %to_sub) + %Base),+,4}<%for.body>
@@ -852,7 +896,18 @@ define dso_local void @forked_ptrs_add_sub_offset(ptr nocapture readonly %Base,
; RECURSE-NEXT: Report: cannot identify array bounds
; RECURSE-NEXT: Dependences:
; RECURSE-NEXT: Run-time memory checks:
+; RECURSE-NEXT: Check 0:
+; RECURSE-NEXT: Comparing group ([[GRP64:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT: Against group ([[GRP65:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; RECURSE-NEXT: Grouped accesses:
+; RECURSE-NEXT: Group [[GRP64]]:
+; RECURSE-NEXT: (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT: Group [[GRP65]]:
+; RECURSE-NEXT: (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
; RECURSE-EMPTY:
; RECURSE-NEXT: Non vectorizable stores to invariant address were not found in loop.
; RECURSE-NEXT: SCEV assumptions:
@@ -890,7 +945,18 @@ define dso_local void @forked_ptrs_mul_by_offset(ptr nocapture readonly %Base, p
; CHECK-NEXT: Report: cannot identify array bounds
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP66:0x[0-9a-f]+]]):
+; CHECK-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; CHECK-NEXT: Against group ([[GRP67:0x[0-9a-f]+]]):
+; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP66]]:
+; CHECK-NEXT: (Low: %Dest High: (400 + %Dest))
+; CHECK-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
+; CHECK-NEXT: Group [[GRP67]]:
+; CHECK-NEXT: (Low: %Preds High: (400 + %Preds))
+; CHECK-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
@@ -902,7 +968,18 @@ define dso_local void @forked_ptrs_mul_by_offset(ptr nocapture readonly %Base, p
; RECURSE-NEXT: Report: cannot identify array bounds
; RECURSE-NEXT: Dependences:
; RECURSE-NEXT: Run-time memory checks:
+; RECURSE-NEXT: Check 0:
+; RECURSE-NEXT: Comparing group ([[GRP68:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT: Against group ([[GRP69:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; RECURSE-NEXT: Grouped accesses:
+; RECURSE-NEXT: Group [[GRP68]]:
+; RECURSE-NEXT: (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT: Group [[GRP69]]:
+; RECURSE-NEXT: (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
; RECURSE-EMPTY:
; RECURSE-NEXT: Non vectorizable stores to invariant address were not found in loop.
; RECURSE-NEXT: SCEV assumptions:
@@ -940,7 +1017,18 @@ define dso_local void @forked_ptrs_uniform_and_strided_forks(ptr nocapture reado
; CHECK-NEXT: Report: cannot identify array bounds
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP70:0x[0-9a-f]+]]):
+; CHECK-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; CHECK-NEXT: Against group ([[GRP71:0x[0-9a-f]+]]):
+; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP70]]:
+; CHECK-NEXT: (Low: %Dest High: (400 + %Dest))
+; CHECK-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
+; CHECK-NEXT: Group [[GRP71]]:
+; CHECK-NEXT: (Low: %Preds High: (400 + %Preds))
+; CHECK-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
@@ -952,7 +1040,18 @@ define dso_local void @forked_ptrs_uniform_and_strided_forks(ptr nocapture reado
; RECURSE-NEXT: Report: cannot identify array bounds
; RECURSE-NEXT: Dependences:
; RECURSE-NEXT: Run-time memory checks:
+; RECURSE-NEXT: Check 0:
+; RECURSE-NEXT: Comparing group ([[GRP72:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT: Against group ([[GRP73:0x[0-9a-f]+]]):
+; RECURSE-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
; RECURSE-NEXT: Grouped accesses:
+; RECURSE-NEXT: Group [[GRP72]]:
+; RECURSE-NEXT: (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT: Group [[GRP73]]:
+; RECURSE-NEXT: (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
; RECURSE-EMPTY:
; RECURSE-NEXT: Non vectorizable stores to invariant address were not found in loop.
; RECURSE-NEXT: SCEV assumptions:
@@ -995,7 +1094,18 @@ define dso_local void @forked_ptrs_gather_and_contiguous_forks(ptr nocapture rea
; CHECK-NEXT: Report: cannot identify array bounds
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP74:0x[0-9a-f]+]]):
+; CHECK-NEXT: %1 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; CHECK-...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a sound change, and the test updates to LoopVersioningLICM look good!
@@ -633,7 +633,18 @@ define dso_local void @forked_ptrs_same_base_different_offset(ptr nocapture read | |||
; CHECK-NEXT: Report: cannot identify array bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but why does the report say this? Can we update this message to "Cannot generate RT checks for pointers: ..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of text is what goes in the -Rpass-analysis=loop-vectorize output, so here it's
remarks.c:8:12: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
8 | B[i] = A[offset];
| ^
It's intended to mean "cannot identify array bounds of this specific memory access".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Hm, then is there something we can add to the output to indicate that runtime checks were not generated for some pointers? Otherwise, the output seems a bit misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an extra line to the output saying this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be good to go, modulo the spurious messages. I checked LVerLICM, and it seems to do the right thing:
// Don't allow stores that we don't have runtime checks for, as we won't be
// able to mark them noalias meaning they would prevent any code motion.
auto &Pointers = LAI->getRuntimePointerChecking()->Pointers;
if (!any_of(Pointers, [&](auto &P) { return P.PointerValue == Ptr; })) {
LLVM_DEBUG(dbgs() << " Found a store without a runtime check.\n");
return false;
}
@@ -3002,6 +3001,8 @@ void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const { | |||
|
|||
// List the pair of accesses need run-time checks to prove independence. | |||
PtrRtChecking->print(OS, Depth); | |||
if (PtrRtChecking->Need && !CanVecMem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check is a bit too lax, and generates spurious messages. Can you check CheckingGroups against Pointers instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work, as CheckingGroups is generated from Pointers in groupChecks, so there won't be any Pointer that doesn't have a corresponding CheckingGroup. I've gone instead with recording what the result of canCheckPtrAtRT was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to make this behavior opt-in? Otherwise we unnecessarily increase compile-time for all other users that do not need this, just to benefit LoopVerLICM which isn't on by default.
Expanding low/end for groups can be costly in some cases.
Doing a simple test with the following:
and doing:
I get: |
See 234cc40 ([LAA] Limit no-overlap check to at least one loop-invariant accesses.): @fhahn has good reason to believe that compile-time will blow up. From the comment: // Check if we can prove that Sink only accesses memory after Src's end or
// vice versa. At the moment this is limited to cases where either source or
// sink are loop invariant to avoid compile-time increases. This is not
// required for correctness. Please use llvm-compile-time-tracker for real measurements. |
This records whether canCheckPtrAtRT succeeded, and is used to decide if we print the "checks are incomplete" message.
LLVM compile-time tracker results: https://llvm-compile-time-tracker.com/compare.php?from=a7ede51b556f40163db9e3cc67c98c27ba2364d8&to=a4dd603c004278e86f80cf6d62597535f0fee126&stat=instructions%3Au |
Regressions of 0.01-0.02% are noise (the compile time tracker doesn't color them), 0.03% is a noticeable regression (the compile time tracker colors these in light red), and 0.05%+ is serious (compile time tracker colors these in deep red). |
I've added an option to LoopAccessAnalysis to control whether partial results are allowed, and enabled it only in LoopVersioningLICM. With that there are no reported compile time regressions: https://llvm-compile-time-tracker.com/compare.php?from=5c28af409978c19a35021855a29dcaa65e95da00&to=c21056925992f7e536faa766b8773c6582ebd916&stat=instructions:u |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! A few minor suggestions follow.
[](StringRef Params) { | ||
return PassBuilder::parseSinglePassOption(Params, "allow-partial", | ||
"LoopAccessInfoPrinterPass"); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can lift this out into parseLoopAccessInfoPrinterOptions to match the rest of the code?
@@ -1325,11 +1330,10 @@ bool AccessAnalysis::canCheckPtrAtRT( | |||
// are needed. This can happen when all pointers point to the same underlying | |||
// object for example. | |||
RtCheck.Need = CanDoRT ? RtCheck.getNumberOfChecks() != 0 : MayNeedRTCheck; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray newline strip?
; RUN: opt -disable-output -passes='print<access-info><allow-partial>,print<access-info>' %s 2>&1 | FileCheck %s --check-prefixes=ALLOW-BEFORE | ||
; RUN: opt -disable-output -passes='print<access-info>,print<access-info><allow-partial>' %s 2>&1 | FileCheck %s --check-prefixes=ALLOW-AFTER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test!
@@ -654,7 +655,8 @@ class RuntimePointerChecking { | |||
/// For memory dependences that cannot be determined at compile time, it | |||
/// generates run-time checks to prove independence. This is done by | |||
/// AccessAnalysis::canCheckPtrAtRT and the checks are maintained by the | |||
/// RuntimePointerCheck class. | |||
/// RuntimePointerCheck class. AllowPartial determines whether partial checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\p AllowPartial to aid Doxygen?
/// (i.e. the pointers have computable bounds). | ||
/// (i.e. the pointers have computable bounds). A return value of false means | ||
/// we couldn't analyze and generate runtime checks for all pointers in the | ||
/// loop, but we will have checks for those pointers we could analyze. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... if \p AllowPartial is set?
std::unique_ptr<MemoryDepChecker> DepChecker; | ||
|
||
Loop *TheLoop; | ||
|
||
bool AllowPartial; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs some documentation. As AllowPartial is the default, it might be better to have the flag indicate that all runtime that can be generated will be collected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you're asking here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just seemed like the comment might have been missed?
bool CanDoRTIfNeeded = Accesses.canCheckPtrAtRT( | ||
*PtrRtChecking, TheLoop, SymbolicStrides, UncomputablePtr); | ||
if (!CanDoRTIfNeeded) { | ||
HasCompletePtrRtChecking = Accesses.canCheckPtrAtRT( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be a member variable? Can we keep the name which still seems accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used in LoopAccessInfo::print to decide whether to print the "Generated run-time checks are incomplete" message.
// We need to create the LoopAccessInfo if either we don't already have one, | ||
// or if it was created with a different value of AllowPartial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we keep the original one if it has AllowPartial = true and AllowPartial = false?
It might be clearer if the flag is renamed to indicate that all runtime checks are collected, even if not valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about that, but I think having consistent output is more useful than any minor efficiency gain from not having to recompute the LoopAccessInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok would have been good to resolve this discussion before merging.
What consistent output are you thinking about? The first only output change would be from the printing pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we consider test/Analysis/LoopAccessAnalysis/allow-partial.ll, then with this change if we compare the output of -passes='print<access-info>'
and -passes='print<access-info><allow-partial>,print<access-info>'
then the print<access-info>
gives different output in the second as it re-uses the LoopAccessInfo from the previous pass.
Potentially we could see the same thing if we has a transform pass that has AllowPartial=true followed by print<access-info>, but the only transform that has AllowPartial=true is LoopVersioningLICM which constructs LoopAccessInfoManager itself instead of using the LoopAccessAnalysis pass, so the analysis results won't be re-used anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, thanks. But that would only impact the print<access-info>
, which is only used for testing?
@@ -0,0 +1,99 @@ | |||
; RUN: opt -disable-output -passes='print<access-info><allow-partial>,print<access-info>' %s 2>&1 | FileCheck %s --check-prefixes=ALLOW-BEFORE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have other print passes that take arguments already, but would it be more consistent to pare this as print<access-info<allow-partial>>
to indicate this is an argument to access-info instead of print<>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately can't be currently done with how passes are defined in PassRegistry.def: "print<access-info>" is the name of the pass, and the options come after that.
Currently if there's any memory access that AccessAnalysis couldn't analyze then all of the runtime pointer check results are discarded. This patch changes that so they're not, and we generate the runtime check information for those pointers that we could analyze, as transformations may still be able to make use of the partial information.
Of the transformations that use LoopAccessAnalysis, only LoopVersioningLICM changes behaviour as a result of this change. This is because the others either: