Skip to content

[LAA] Keep pointer checks on partial analysis #139719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

john-brawn-arm
Copy link
Collaborator

Currently if there's any memory access that AccessAnalysis couldn't analyze then all of the runtime pointer check results are discarded. This patch changes that so they're not, and we generate the runtime check information for those pointers that we could analyze, as transformations may still be able to make use of the partial information.

Of the transformations that use LoopAccessAnalysis, only LoopVersioningLICM changes behaviour as a result of this change. This is because the others either:

  • Check canVectorizeMemory, which will return false when we have partial pointer information as analyzeLoop() will return false.
  • Examine the dependencies returned by getDepChecker(), which will be empty as we exit analyzeLoop if we have partial pointer information before calling areDepsSafe(), which is what fills in the dependency information.

Currently if there's any memory access that AccessAnalysis couldn't
analyze then all of the runtime pointer check results are discarded.
This patch changes that so they're not, and we generate the runtime
check information for those pointers that we could analyze, as
transformations may still be able to make use of the partial
information.

Of the transformations that use LoopAccessAnalysis, only
LoopVersioningLICM changes behaviour as a result of this change. This
is because the others either:
 * Check canVectorizeMemory, which will return false when we have
   partial pointer information as analyzeLoop() will return false.
 * Examine the dependencies returned by getDepChecker(), which will be
   empty as we exit analyzeLoop if we have partial pointer information
   before calling areDepsSafe(), which is what fills in the dependency
   information.
@llvmbot
Copy link
Member

llvmbot commented May 13, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: John Brawn (john-brawn-arm)

Changes

Currently if there's any memory access that AccessAnalysis couldn't analyze then all of the runtime pointer check results are discarded. This patch changes that so they're not, and we generate the runtime check information for those pointers that we could analyze, as transformations may still be able to make use of the partial information.

Of the transformations that use LoopAccessAnalysis, only LoopVersioningLICM changes behaviour as a result of this change. This is because the others either:

  • Check canVectorizeMemory, which will return false when we have partial pointer information as analyzeLoop() will return false.
  • Examine the dependencies returned by getDepChecker(), which will be empty as we exit analyzeLoop if we have partial pointer information before calling areDepsSafe(), which is what fills in the dependency information.

Patch is 64.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139719.diff

9 Files Affected:

  • (modified) llvm/include/llvm/Analysis/LoopAccessAnalysis.h (+7-2)
  • (modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+9-10)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll (+254-53)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-backward.ll (+12)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-forward.ll (+9)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/pointer-phis.ll (+12-9)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/retry-runtime-checks-after-dependence-analysis-forked-pointers.ll (+14)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/underlying-object-loop-varying-phi.ll (+14)
  • (modified) llvm/test/Transforms/LoopVersioningLICM/load-from-unknown-address.ll (+140-28)
diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index f715e0ec8dbb4..0abd7f1182108 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -192,7 +192,8 @@ class MemoryDepChecker {
   /// of a write access.
   void addAccess(LoadInst *LI);
 
-  /// Check whether the dependencies between the accesses are safe.
+  /// Check whether the dependencies between the accesses are safe, and records
+  /// the dependence information in Dependences if so.
   ///
   /// Only checks sets with elements in \p CheckDeps.
   bool areDepsSafe(const DepCandidates &AccessSets,
@@ -779,10 +780,14 @@ class LoopAccessInfo {
 
   /// We need to check that all of the pointers in this list are disjoint
   /// at runtime. Using std::unique_ptr to make using move ctor simpler.
+  /// This list may contain only partial information when we've failed to
+  /// analyze all the memory accesses in the loop (i.e. CanVecMem is false).
   std::unique_ptr<RuntimePointerChecking> PtrRtChecking;
 
-  /// the Memory Dependence Checker which can determine the
+  /// The Memory Dependence Checker which can determine the
   /// loop-independent and loop-carried dependences between memory accesses.
+  /// This will be empty if we've failed to analyze all the memory access in the
+  /// loop (i.e. CanVecMem is false).
   std::unique_ptr<MemoryDepChecker> DepChecker;
 
   Loop *TheLoop;
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index af1a3c593c514..4d912abdf59a8 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -529,8 +529,10 @@ void RuntimePointerChecking::groupChecks(
     // equivalence class, the iteration order is deterministic.
     for (auto M : DepCands.members(Access)) {
       auto PointerI = PositionMap.find(M.getPointer());
-      assert(PointerI != PositionMap.end() &&
-             "pointer in equivalence class not found in PositionMap");
+      // If we can't find the pointer in PositionMap that means we can't
+      // generate a memcheck for it.
+      if (PointerI == PositionMap.end())
+        continue;
       for (unsigned Pointer : PointerI->second) {
         bool Merged = false;
         // Mark this pointer as seen.
@@ -682,7 +684,9 @@ class AccessAnalysis {
   /// non-intersection.
   ///
   /// Returns true if we need no check or if we do and we can generate them
-  /// (i.e. the pointers have computable bounds).
+  /// (i.e. the pointers have computable bounds). A return value of false means
+  /// we couldn't analyze and generate runtime checks for all pointers in the
+  /// loop, but we will have checks for those pointers we could analyze.
   bool canCheckPtrAtRT(RuntimePointerChecking &RtCheck, ScalarEvolution *SE,
                        Loop *TheLoop,
                        const DenseMap<Value *, const SCEV *> &Strides,
@@ -1273,7 +1277,6 @@ bool AccessAnalysis::canCheckPtrAtRT(
                                   /*Assume=*/true)) {
           CanDoAliasSetRT = false;
           UncomputablePtr = Access.getPointer();
-          break;
         }
       }
     }
@@ -1313,7 +1316,7 @@ bool AccessAnalysis::canCheckPtrAtRT(
     }
   }
 
-  if (MayNeedRTCheck && CanDoRT)
+  if (MayNeedRTCheck)
     RtCheck.generateChecks(DepCands, IsDepCheckNeeded);
 
   LLVM_DEBUG(dbgs() << "LAA: We need to do " << RtCheck.getNumberOfChecks()
@@ -1323,11 +1326,7 @@ bool AccessAnalysis::canCheckPtrAtRT(
   // are needed. This can happen when all pointers point to the same underlying
   // object for example.
   RtCheck.Need = CanDoRT ? RtCheck.getNumberOfChecks() != 0 : MayNeedRTCheck;
-
-  bool CanDoRTIfNeeded = !RtCheck.Need || CanDoRT;
-  if (!CanDoRTIfNeeded)
-    RtCheck.reset();
-  return CanDoRTIfNeeded;
+  return !RtCheck.Need || CanDoRT;
 }
 
 void AccessAnalysis::processMemAccesses() {
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll b/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
index cd388b4ee87f2..f0885cf0a6043 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
@@ -633,7 +633,18 @@ define dso_local void @forked_ptrs_same_base_different_offset(ptr nocapture read
 ; CHECK-NEXT:      Report: cannot identify array bounds
 ; CHECK-NEXT:      Dependences:
 ; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP47:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; CHECK-NEXT:        Against group ([[GRP48:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; CHECK-NEXT:      Grouped accesses:
+; CHECK-NEXT:        Group [[GRP47]]:
+; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest))
+; CHECK-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
+; CHECK-NEXT:        Group [[GRP48]]:
+; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds))
+; CHECK-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
@@ -645,7 +656,18 @@ define dso_local void @forked_ptrs_same_base_different_offset(ptr nocapture read
 ; RECURSE-NEXT:      Report: cannot identify array bounds
 ; RECURSE-NEXT:      Dependences:
 ; RECURSE-NEXT:      Run-time memory checks:
+; RECURSE-NEXT:      Check 0:
+; RECURSE-NEXT:        Comparing group ([[GRP49:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT:        Against group ([[GRP50:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; RECURSE-NEXT:      Grouped accesses:
+; RECURSE-NEXT:        Group [[GRP49]]:
+; RECURSE-NEXT:          (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT:        Group [[GRP50]]:
+; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; RECURSE-NEXT:      SCEV assumptions:
@@ -684,24 +706,24 @@ define dso_local void @forked_ptrs_add_to_offset(ptr nocapture readonly %Base, p
 ; CHECK-NEXT:      Dependences:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
-; CHECK-NEXT:        Comparing group ([[GRP47:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Comparing group ([[GRP51:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT:        Against group ([[GRP48:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Against group ([[GRP52:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; CHECK-NEXT:      Check 1:
-; CHECK-NEXT:        Comparing group ([[GRP47]]):
+; CHECK-NEXT:        Comparing group ([[GRP51]]):
 ; CHECK-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT:        Against group ([[GRP49:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Against group ([[GRP53:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
 ; CHECK-NEXT:          %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
 ; CHECK-NEXT:      Grouped accesses:
-; CHECK-NEXT:        Group [[GRP47]]:
+; CHECK-NEXT:        Group [[GRP51]]:
 ; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest))
 ; CHECK-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
-; CHECK-NEXT:        Group [[GRP48]]:
+; CHECK-NEXT:        Group [[GRP52]]:
 ; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds))
 ; CHECK-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
-; CHECK-NEXT:        Group [[GRP49]]:
+; CHECK-NEXT:        Group [[GRP53]]:
 ; CHECK-NEXT:          (Low: ((4 * %extra_offset) + %Base) High: (404 + (4 * %extra_offset) + %Base))
 ; CHECK-NEXT:            Member: {(4 + (4 * %extra_offset) + %Base),+,4}<%for.body>
 ; CHECK-NEXT:            Member: {((4 * %extra_offset) + %Base),+,4}<%for.body>
@@ -716,7 +738,18 @@ define dso_local void @forked_ptrs_add_to_offset(ptr nocapture readonly %Base, p
 ; RECURSE-NEXT:      Report: cannot identify array bounds
 ; RECURSE-NEXT:      Dependences:
 ; RECURSE-NEXT:      Run-time memory checks:
+; RECURSE-NEXT:      Check 0:
+; RECURSE-NEXT:        Comparing group ([[GRP54:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT:        Against group ([[GRP55:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; RECURSE-NEXT:      Grouped accesses:
+; RECURSE-NEXT:        Group [[GRP54]]:
+; RECURSE-NEXT:          (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT:        Group [[GRP55]]:
+; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; RECURSE-NEXT:      SCEV assumptions:
@@ -752,24 +785,24 @@ define dso_local void @forked_ptrs_sub_from_offset(ptr nocapture readonly %Base,
 ; CHECK-NEXT:      Dependences:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
-; CHECK-NEXT:        Comparing group ([[GRP50:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Comparing group ([[GRP56:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT:        Against group ([[GRP51:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Against group ([[GRP57:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; CHECK-NEXT:      Check 1:
-; CHECK-NEXT:        Comparing group ([[GRP50]]):
+; CHECK-NEXT:        Comparing group ([[GRP56]]):
 ; CHECK-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT:        Against group ([[GRP52:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Against group ([[GRP58:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
 ; CHECK-NEXT:          %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
 ; CHECK-NEXT:      Grouped accesses:
-; CHECK-NEXT:        Group [[GRP50]]:
+; CHECK-NEXT:        Group [[GRP56]]:
 ; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest))
 ; CHECK-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
-; CHECK-NEXT:        Group [[GRP51]]:
+; CHECK-NEXT:        Group [[GRP57]]:
 ; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds))
 ; CHECK-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
-; CHECK-NEXT:        Group [[GRP52]]:
+; CHECK-NEXT:        Group [[GRP58]]:
 ; CHECK-NEXT:          (Low: ((-4 * %extra_offset) + %Base) High: (404 + (-4 * %extra_offset) + %Base))
 ; CHECK-NEXT:            Member: {(4 + (-4 * %extra_offset) + %Base),+,4}<%for.body>
 ; CHECK-NEXT:            Member: {((-4 * %extra_offset) + %Base),+,4}<%for.body>
@@ -784,7 +817,18 @@ define dso_local void @forked_ptrs_sub_from_offset(ptr nocapture readonly %Base,
 ; RECURSE-NEXT:      Report: cannot identify array bounds
 ; RECURSE-NEXT:      Dependences:
 ; RECURSE-NEXT:      Run-time memory checks:
+; RECURSE-NEXT:      Check 0:
+; RECURSE-NEXT:        Comparing group ([[GRP59:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT:        Against group ([[GRP60:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; RECURSE-NEXT:      Grouped accesses:
+; RECURSE-NEXT:        Group [[GRP59]]:
+; RECURSE-NEXT:          (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT:        Group [[GRP60]]:
+; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; RECURSE-NEXT:      SCEV assumptions:
@@ -820,24 +864,24 @@ define dso_local void @forked_ptrs_add_sub_offset(ptr nocapture readonly %Base,
 ; CHECK-NEXT:      Dependences:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
-; CHECK-NEXT:        Comparing group ([[GRP53:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Comparing group ([[GRP61:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT:        Against group ([[GRP54:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Against group ([[GRP62:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; CHECK-NEXT:      Check 1:
-; CHECK-NEXT:        Comparing group ([[GRP53]]):
+; CHECK-NEXT:        Comparing group ([[GRP61]]):
 ; CHECK-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
-; CHECK-NEXT:        Against group ([[GRP55:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Against group ([[GRP63:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
 ; CHECK-NEXT:          %arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %offset
 ; CHECK-NEXT:      Grouped accesses:
-; CHECK-NEXT:        Group [[GRP53]]:
+; CHECK-NEXT:        Group [[GRP61]]:
 ; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest))
 ; CHECK-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
-; CHECK-NEXT:        Group [[GRP54]]:
+; CHECK-NEXT:        Group [[GRP62]]:
 ; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds))
 ; CHECK-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
-; CHECK-NEXT:        Group [[GRP55]]:
+; CHECK-NEXT:        Group [[GRP63]]:
 ; CHECK-NEXT:          (Low: ((4 * %to_add) + (-4 * %to_sub) + %Base) High: (404 + (4 * %to_add) + (-4 * %to_sub) + %Base))
 ; CHECK-NEXT:            Member: {(4 + (4 * %to_add) + (-4 * %to_sub) + %Base),+,4}<%for.body>
 ; CHECK-NEXT:            Member: {((4 * %to_add) + (-4 * %to_sub) + %Base),+,4}<%for.body>
@@ -852,7 +896,18 @@ define dso_local void @forked_ptrs_add_sub_offset(ptr nocapture readonly %Base,
 ; RECURSE-NEXT:      Report: cannot identify array bounds
 ; RECURSE-NEXT:      Dependences:
 ; RECURSE-NEXT:      Run-time memory checks:
+; RECURSE-NEXT:      Check 0:
+; RECURSE-NEXT:        Comparing group ([[GRP64:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT:        Against group ([[GRP65:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; RECURSE-NEXT:      Grouped accesses:
+; RECURSE-NEXT:        Group [[GRP64]]:
+; RECURSE-NEXT:          (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT:        Group [[GRP65]]:
+; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; RECURSE-NEXT:      SCEV assumptions:
@@ -890,7 +945,18 @@ define dso_local void @forked_ptrs_mul_by_offset(ptr nocapture readonly %Base, p
 ; CHECK-NEXT:      Report: cannot identify array bounds
 ; CHECK-NEXT:      Dependences:
 ; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP66:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; CHECK-NEXT:        Against group ([[GRP67:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; CHECK-NEXT:      Grouped accesses:
+; CHECK-NEXT:        Group [[GRP66]]:
+; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest))
+; CHECK-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
+; CHECK-NEXT:        Group [[GRP67]]:
+; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds))
+; CHECK-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
@@ -902,7 +968,18 @@ define dso_local void @forked_ptrs_mul_by_offset(ptr nocapture readonly %Base, p
 ; RECURSE-NEXT:      Report: cannot identify array bounds
 ; RECURSE-NEXT:      Dependences:
 ; RECURSE-NEXT:      Run-time memory checks:
+; RECURSE-NEXT:      Check 0:
+; RECURSE-NEXT:        Comparing group ([[GRP68:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT:        Against group ([[GRP69:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; RECURSE-NEXT:      Grouped accesses:
+; RECURSE-NEXT:        Group [[GRP68]]:
+; RECURSE-NEXT:          (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT:        Group [[GRP69]]:
+; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; RECURSE-NEXT:      SCEV assumptions:
@@ -940,7 +1017,18 @@ define dso_local void @forked_ptrs_uniform_and_strided_forks(ptr nocapture reado
 ; CHECK-NEXT:      Report: cannot identify array bounds
 ; CHECK-NEXT:      Dependences:
 ; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP70:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; CHECK-NEXT:        Against group ([[GRP71:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; CHECK-NEXT:      Grouped accesses:
+; CHECK-NEXT:        Group [[GRP70]]:
+; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest))
+; CHECK-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
+; CHECK-NEXT:        Group [[GRP71]]:
+; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds))
+; CHECK-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
@@ -952,7 +1040,18 @@ define dso_local void @forked_ptrs_uniform_and_strided_forks(ptr nocapture reado
 ; RECURSE-NEXT:      Report: cannot identify array bounds
 ; RECURSE-NEXT:      Dependences:
 ; RECURSE-NEXT:      Run-time memory checks:
+; RECURSE-NEXT:      Check 0:
+; RECURSE-NEXT:        Comparing group ([[GRP72:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; RECURSE-NEXT:        Against group ([[GRP73:0x[0-9a-f]+]]):
+; RECURSE-NEXT:          %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
 ; RECURSE-NEXT:      Grouped accesses:
+; RECURSE-NEXT:        Group [[GRP72]]:
+; RECURSE-NEXT:          (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
+; RECURSE-NEXT:        Group [[GRP73]]:
+; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; RECURSE-NEXT:      SCEV assumptions:
@@ -995,7 +1094,18 @@ define dso_local void @forked_ptrs_gather_and_contiguous_forks(ptr nocapture rea
 ; CHECK-NEXT:      Report: cannot identify array bounds
 ; CHECK-NEXT:      Dependences:
 ; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP74:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %1 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
+; CHECK-...
[truncated]

Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a sound change, and the test updates to LoopVersioningLICM look good!

@@ -633,7 +633,18 @@ define dso_local void @forked_ptrs_same_base_different_offset(ptr nocapture read
; CHECK-NEXT: Report: cannot identify array bounds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but why does the report say this? Can we update this message to "Cannot generate RT checks for pointers: ..."?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of text is what goes in the -Rpass-analysis=loop-vectorize output, so here it's

remarks.c:8:12: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
    8 |     B[i] = A[offset];
      |            ^

It's intended to mean "cannot identify array bounds of this specific memory access".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Hm, then is there something we can add to the output to indicate that runtime checks were not generated for some pointers? Otherwise, the output seems a bit misleading.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an extra line to the output saying this.

@artagnon artagnon changed the title [LoopAccessAnalysis] Keep pointer checks on partial analysis [LAA] Keep pointer checks on partial analysis May 13, 2025
Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be good to go, modulo the spurious messages. I checked LVerLICM, and it seems to do the right thing:

    // Don't allow stores that we don't have runtime checks for, as we won't be
    // able to mark them noalias meaning they would prevent any code motion.
    auto &Pointers = LAI->getRuntimePointerChecking()->Pointers;
    if (!any_of(Pointers, [&](auto &P) { return P.PointerValue == Ptr; })) {
      LLVM_DEBUG(dbgs() << "    Found a store without a runtime check.\n");
      return false;
    }

@@ -3002,6 +3001,8 @@ void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {

// List the pair of accesses need run-time checks to prove independence.
PtrRtChecking->print(OS, Depth);
if (PtrRtChecking->Need && !CanVecMem)
Copy link
Contributor

@artagnon artagnon May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is a bit too lax, and generates spurious messages. Can you check CheckingGroups against Pointers instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work, as CheckingGroups is generated from Pointers in groupChecks, so there won't be any Pointer that doesn't have a corresponding CheckingGroup. I've gone instead with recording what the result of canCheckPtrAtRT was.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to make this behavior opt-in? Otherwise we unnecessarily increase compile-time for all other users that do not need this, just to benefit LoopVerLICM which isn't on by default.

Expanding low/end for groups can be costly in some cases.

@john-brawn-arm
Copy link
Collaborator Author

john-brawn-arm commented May 19, 2025

Would it be possible to make this behavior opt-in? Otherwise we unnecessarily increase compile-time for all other users that do not need this, just to benefit LoopVerLICM which isn't on by default.

Expanding low/end for groups can be costly in some cases.

Doing a simple test with the following:

void fn(char *p, char *q, char **r, int n) {
#define LOOP \
  for (int i = 0; i < n; i++) { \
    p[i] += q[i] + r[0][i]; \
    p[i] += q[i+1] + r[1][i]; \
    p[i] += q[i+2] + r[2][i]; \
    p[i] += q[i+3] + r[3][i]; \
    p[i] += q[i+4] + r[4][i]; \
    p[i] += q[i+5] + r[5][i]; \
  }

#define DO10 LOOP LOOP LOOP LOOP LOOP LOOP LOOP LOOP LOOP LOOP
#define DO100 DO10 DO10 DO10 DO10 DO10 DO10 DO10 DO10 DO10 DO10
#define DO1000 DO100 DO100 DO100 DO100 DO100 DO100 DO100 DO100 DO100 DO100

  DO1000
}

and doing:

  • Compile with clang -O2 -emit-llvm
  • Run opt -passes="print<access-info>" -disable-output --time-trace 20 times with and without this change
  • Get the duration of LoopAccessAnalysis (which is in microseconds) from the time trace file

I get:
Without: mean 1307, median 1347, standard deviation 119
With: mean 1301, median 1335, standard deviation 116
So it looks like any increase in compile-time is negligible and well within the measurement noise, for this test case at least.

@artagnon
Copy link
Contributor

artagnon commented May 19, 2025

See 234cc40 ([LAA] Limit no-overlap check to at least one loop-invariant accesses.): @fhahn has good reason to believe that compile-time will blow up. From the comment:

  // Check if we can prove that Sink only accesses memory after Src's end or
  // vice versa. At the moment this is limited to cases where either source or
  // sink are loop invariant to avoid compile-time increases. This is not
  // required for correctness.

Please use llvm-compile-time-tracker for real measurements.

This records whether canCheckPtrAtRT succeeded, and is used to decide
if we print the "checks are incomplete" message.
@john-brawn-arm
Copy link
Collaborator Author

LLVM compile-time tracker results: https://llvm-compile-time-tracker.com/compare.php?from=a7ede51b556f40163db9e3cc67c98c27ba2364d8&to=a4dd603c004278e86f80cf6d62597535f0fee126&stat=instructions%3Au
There's a couple of benchmarks (mafft and lencod) that are showing as red, but the largest regression is 0.06% which doesn't seem like a lot. Is it still large enough to need to do something about it?

@artagnon
Copy link
Contributor

but the largest regression is 0.06% which doesn't seem like a lot. Is it still large enough to need to do something about it?

Regressions of 0.01-0.02% are noise (the compile time tracker doesn't color them), 0.03% is a noticeable regression (the compile time tracker colors these in light red), and 0.05%+ is serious (compile time tracker colors these in deep red).

@john-brawn-arm
Copy link
Collaborator Author

I've added an option to LoopAccessAnalysis to control whether partial results are allowed, and enabled it only in LoopVersioningLICM. With that there are no reported compile time regressions: https://llvm-compile-time-tracker.com/compare.php?from=5c28af409978c19a35021855a29dcaa65e95da00&to=c21056925992f7e536faa766b8773c6582ebd916&stat=instructions:u

Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! A few minor suggestions follow.

Comment on lines +590 to +593
[](StringRef Params) {
return PassBuilder::parseSinglePassOption(Params, "allow-partial",
"LoopAccessInfoPrinterPass");
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can lift this out into parseLoopAccessInfoPrinterOptions to match the rest of the code?

@@ -1325,11 +1330,10 @@ bool AccessAnalysis::canCheckPtrAtRT(
// are needed. This can happen when all pointers point to the same underlying
// object for example.
RtCheck.Need = CanDoRT ? RtCheck.getNumberOfChecks() != 0 : MayNeedRTCheck;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray newline strip?

Comment on lines +1 to +2
; RUN: opt -disable-output -passes='print<access-info><allow-partial>,print<access-info>' %s 2>&1 | FileCheck %s --check-prefixes=ALLOW-BEFORE
; RUN: opt -disable-output -passes='print<access-info>,print<access-info><allow-partial>' %s 2>&1 | FileCheck %s --check-prefixes=ALLOW-AFTER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

@@ -654,7 +655,8 @@ class RuntimePointerChecking {
/// For memory dependences that cannot be determined at compile time, it
/// generates run-time checks to prove independence. This is done by
/// AccessAnalysis::canCheckPtrAtRT and the checks are maintained by the
/// RuntimePointerCheck class.
/// RuntimePointerCheck class. AllowPartial determines whether partial checks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\p AllowPartial to aid Doxygen?

/// (i.e. the pointers have computable bounds).
/// (i.e. the pointers have computable bounds). A return value of false means
/// we couldn't analyze and generate runtime checks for all pointers in the
/// loop, but we will have checks for those pointers we could analyze.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... if \p AllowPartial is set?

std::unique_ptr<MemoryDepChecker> DepChecker;

Loop *TheLoop;

bool AllowPartial;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some documentation. As AllowPartial is the default, it might be better to have the flag indicate that all runtime that can be generated will be collected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the naming?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're asking here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just seemed like the comment might have been missed?

bool CanDoRTIfNeeded = Accesses.canCheckPtrAtRT(
*PtrRtChecking, TheLoop, SymbolicStrides, UncomputablePtr);
if (!CanDoRTIfNeeded) {
HasCompletePtrRtChecking = Accesses.canCheckPtrAtRT(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a member variable? Can we keep the name which still seems accurate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used in LoopAccessInfo::print to decide whether to print the "Generated run-time checks are incomplete" message.

Comment on lines +2995 to +2996
// We need to create the LoopAccessInfo if either we don't already have one,
// or if it was created with a different value of AllowPartial.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we keep the original one if it has AllowPartial = true and AllowPartial = false?

It might be clearer if the flag is renamed to indicate that all runtime checks are collected, even if not valid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about that, but I think having consistent output is more useful than any minor efficiency gain from not having to recompute the LoopAccessInfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok would have been good to resolve this discussion before merging.

What consistent output are you thinking about? The first only output change would be from the printing pass?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we consider test/Analysis/LoopAccessAnalysis/allow-partial.ll, then with this change if we compare the output of -passes='print<access-info>' and -passes='print<access-info><allow-partial>,print<access-info>' then the print<access-info> gives different output in the second as it re-uses the LoopAccessInfo from the previous pass.

Potentially we could see the same thing if we has a transform pass that has AllowPartial=true followed by print<access-info>, but the only transform that has AllowPartial=true is LoopVersioningLICM which constructs LoopAccessInfoManager itself instead of using the LoopAccessAnalysis pass, so the analysis results won't be re-used anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, thanks. But that would only impact the print<access-info>, which is only used for testing?

@@ -0,0 +1,99 @@
; RUN: opt -disable-output -passes='print<access-info><allow-partial>,print<access-info>' %s 2>&1 | FileCheck %s --check-prefixes=ALLOW-BEFORE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have other print passes that take arguments already, but would it be more consistent to pare this as print<access-info<allow-partial>> to indicate this is an argument to access-info instead of print<>?

Copy link
Collaborator Author

@john-brawn-arm john-brawn-arm Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately can't be currently done with how passes are defined in PassRegistry.def: "print<access-info>" is the name of the pass, and the options come after that.

@john-brawn-arm john-brawn-arm merged commit 81d3189 into llvm:main Jun 4, 2025
7 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants