Skip to content

[LAA] Support different strides & non constant dep distances using SCEV. #88039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 25, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Apr 8, 2024

Extend LoopAccessAnalysis to support different strides and as a consequence non-constant distances between dependences using SCEV to reason about the direction of the dependence.

In multiple places, logic to rule out dependences using the stride has been updated to only be used if StrideA == StrideB, i.e. there's a common stride.

We now also may bail out at multiple places where we may have to set FoundNonConstantDistanceDependence. This is done when we need to bail out and the distance is not constant to preserve original behavior.

Fixes #87336

Extend LoopAccessAnalysis to support different strides and as a
consequence non-constant distances between dependences using SCEV to
reason about the direction of the dependence.

In multiple places, logic to rule out dependences using the stride has
been updated to only be used if StrideA == StrideB, i.e. there's a
common stride.

We now also may bail out at multiple places where we may have to set
FoundNonConstantDistanceDependence. This is done when we need to bail
out and the distance is not constant to preserve original behavior.

I'd like to call out the changes in global_alias.ll in particular.
In the modified mayAlias01, and mayAlias02 tests, there should be no
aliasing in the original versions of the test, as they are accessing 2
different 100 element fields in a loop with 100 iterations.

I moved the original tests to noAlias15 and noAlias16 respectively,
while updating the original tests to use a variable trip count.

In some cases, like different_non_constant_strides_known_backward_min_distance_3,
we now vectorize with runtime checks, even though the runtime checks
will always be false. I'll also share a follow-up patch, that also uses
SCEV to more accurately identify backwards dependences with non-constant
distances.

Fixes llvm#87336
@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2024

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Extend LoopAccessAnalysis to support different strides and as a consequence non-constant distances between dependences using SCEV to reason about the direction of the dependence.

In multiple places, logic to rule out dependences using the stride has been updated to only be used if StrideA == StrideB, i.e. there's a common stride.

We now also may bail out at multiple places where we may have to set FoundNonConstantDistanceDependence. This is done when we need to bail out and the distance is not constant to preserve original behavior.

I'd like to call out the changes in global_alias.ll in particular. In the modified mayAlias01, and mayAlias02 tests, there should be no aliasing in the original versions of the test, as they are accessing 2 different 100 element fields in a loop with 100 iterations.

I moved the original tests to noAlias15 and noAlias16 respectively, while updating the original tests to use a variable trip count.

In some cases, like different_non_constant_strides_known_backward_min_distance_3, we now vectorize with runtime checks, even though the runtime checks will always be false. I'll also share a follow-up patch, that also uses SCEV to more accurately identify backwards dependences with non-constant distances.

Fixes #87336


Patch is 29.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88039.diff

6 Files Affected:

  • (modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+78-37)
  • (modified) llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp (+3-1)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-backward.ll (+60-30)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-forward.ll (+4-6)
  • (modified) llvm/test/Transforms/LoopVectorize/global_alias.ll (+92-10)
  • (modified) llvm/test/Transforms/PhaseOrdering/single-iteration-loop-sroa.ll (+37-3)
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index c25eede96a1859..314484e11c4a7c 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1923,8 +1923,9 @@ isLoopVariantIndirectAddress(ArrayRef<const Value *> UnderlyingObjects,
 // of various temporary variables, like A/BPtr, StrideA/BPtr and others.
 // Returns either the dependence result, if it could already be determined, or a
 // tuple with (Distance, Stride, TypeSize, AIsWrite, BIsWrite).
-static std::variant<MemoryDepChecker::Dependence::DepType,
-                    std::tuple<const SCEV *, uint64_t, uint64_t, bool, bool>>
+static std::variant<
+    MemoryDepChecker::Dependence::DepType,
+    std::tuple<const SCEV *, uint64_t, uint64_t, uint64_t, bool, bool>>
 getDependenceDistanceStrideAndSize(
     const AccessAnalysis::MemAccessInfo &A, Instruction *AInst,
     const AccessAnalysis::MemAccessInfo &B, Instruction *BInst,
@@ -1982,7 +1983,7 @@ getDependenceDistanceStrideAndSize(
   // Need accesses with constant stride. We don't want to vectorize
   // "A[B[i]] += ..." and similar code or pointer arithmetic that could wrap
   // in the address space.
-  if (!StrideAPtr || !StrideBPtr || StrideAPtr != StrideBPtr) {
+  if (!StrideAPtr || !StrideBPtr) {
     LLVM_DEBUG(dbgs() << "Pointer access with non-constant stride\n");
     return MemoryDepChecker::Dependence::Unknown;
   }
@@ -1992,8 +1993,8 @@ getDependenceDistanceStrideAndSize(
       DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy);
   if (!HasSameSize)
     TypeByteSize = 0;
-  uint64_t Stride = std::abs(StrideAPtr);
-  return std::make_tuple(Dist, Stride, TypeByteSize, AIsWrite, BIsWrite);
+  return std::make_tuple(Dist, std::abs(StrideAPtr), std::abs(StrideBPtr),
+                         TypeByteSize, AIsWrite, BIsWrite);
 }
 
 MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
@@ -2011,68 +2012,108 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
   if (std::holds_alternative<Dependence::DepType>(Res))
     return std::get<Dependence::DepType>(Res);
 
-  const auto &[Dist, Stride, TypeByteSize, AIsWrite, BIsWrite] =
-      std::get<std::tuple<const SCEV *, uint64_t, uint64_t, bool, bool>>(Res);
+  const auto &[Dist, StrideA, StrideB, TypeByteSize, AIsWrite, BIsWrite] =
+      std::get<
+          std::tuple<const SCEV *, uint64_t, uint64_t, uint64_t, bool, bool>>(
+          Res);
   bool HasSameSize = TypeByteSize > 0;
 
+  uint64_t CommonStride = StrideA == StrideB ? StrideA : 0;
+  if (isa<SCEVCouldNotCompute>(Dist)) {
+      FoundNonConstantDistanceDependence = true;
+    LLVM_DEBUG(dbgs() << "LAA: Dependence because of uncomputable distance.\n");
+    return Dependence::Unknown;
+  }
+
   ScalarEvolution &SE = *PSE.getSE();
   auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout();
-  if (!isa<SCEVCouldNotCompute>(Dist) && HasSameSize &&
+  if (HasSameSize && CommonStride &&
       isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist,
-                               Stride, TypeByteSize))
+                               CommonStride, TypeByteSize))
     return Dependence::NoDep;
 
   const SCEVConstant *C = dyn_cast<SCEVConstant>(Dist);
-  if (!C) {
-    LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");
-    FoundNonConstantDistanceDependence = true;
-    return Dependence::Unknown;
-  }
-
-  const APInt &Val = C->getAPInt();
-  int64_t Distance = Val.getSExtValue();
 
   // Attempt to prove strided accesses independent.
-  if (std::abs(Distance) > 0 && Stride > 1 && HasSameSize &&
-      areStridedAccessesIndependent(std::abs(Distance), Stride, TypeByteSize)) {
-    LLVM_DEBUG(dbgs() << "LAA: Strided accesses are independent\n");
-    return Dependence::NoDep;
+  if (C) {
+    const APInt &Val = C->getAPInt();
+    int64_t Distance = Val.getSExtValue();
+
+    if (std::abs(Distance) > 0 && CommonStride > 1 && HasSameSize &&
+        areStridedAccessesIndependent(std::abs(Distance), CommonStride,
+                                      TypeByteSize)) {
+      LLVM_DEBUG(dbgs() << "LAA: Strided accesses are independent\n");
+      return Dependence::NoDep;
+    }
+  }
+
+  // Write to the same location with the same size.
+  if (SE.isKnownPredicate(CmpInst::ICMP_EQ, Dist,
+                          SE.getZero(Dist->getType()))) {
+    if (HasSameSize)
+      return Dependence::Forward;
+    LLVM_DEBUG(
+        dbgs() << "LAA: Zero dependence difference but different type sizes\n");
+    return Dependence::Unknown;
   }
 
   // Negative distances are not plausible dependencies.
-  if (Val.isNegative()) {
+  if (SE.isKnownNonPositive(Dist)) {
+    if (!SE.isKnownNegative(Dist) && !HasSameSize) {
+      LLVM_DEBUG(dbgs() << "LAA: possibly zero dependence difference but "
+                           "different type sizes\n");
+      return Dependence::Unknown;
+    }
+
     bool IsTrueDataDependence = (AIsWrite && !BIsWrite);
     // There is no need to update MaxSafeVectorWidthInBits after call to
     // couldPreventStoreLoadForward, even if it changed MinDepDistBytes,
     // since a forward dependency will allow vectorization using any width.
-    if (IsTrueDataDependence && EnableForwardingConflictDetection &&
-        (!HasSameSize || couldPreventStoreLoadForward(Val.abs().getZExtValue(),
-                                                      TypeByteSize))) {
-      LLVM_DEBUG(dbgs() << "LAA: Forward but may prevent st->ld forwarding\n");
-      return Dependence::ForwardButPreventsForwarding;
+    if (IsTrueDataDependence && EnableForwardingConflictDetection) {
+      if (!C) {
+        FoundNonConstantDistanceDependence = true;
+        return Dependence::Unknown;
+      }
+      if (!HasSameSize ||
+          couldPreventStoreLoadForward(C->getAPInt().abs().getZExtValue(),
+                                       TypeByteSize)) {
+        LLVM_DEBUG(
+            dbgs() << "LAA: Forward but may prevent st->ld forwarding\n");
+        return Dependence::ForwardButPreventsForwarding;
+      }
     }
 
     LLVM_DEBUG(dbgs() << "LAA: Dependence is negative\n");
     return Dependence::Forward;
   }
 
-  // Write to the same location with the same size.
-  if (Val == 0) {
-    if (HasSameSize)
-      return Dependence::Forward;
-    LLVM_DEBUG(
-        dbgs() << "LAA: Zero dependence difference but different type sizes\n");
+  if (!SE.isKnownPositive(Dist)) {
+    if (!C)
+      FoundNonConstantDistanceDependence = true;
     return Dependence::Unknown;
   }
 
-  assert(Val.isStrictlyPositive() && "Expect a positive value");
-
   if (!HasSameSize) {
     LLVM_DEBUG(dbgs() << "LAA: ReadWrite-Write positive dependency with "
                          "different type sizes\n");
+    if (!C)
+      FoundNonConstantDistanceDependence = true;
     return Dependence::Unknown;
   }
 
+  if (!C) {
+    LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");
+    FoundNonConstantDistanceDependence = true;
+    return Dependence::Unknown;
+  }
+
+  // The logic below currently only supports StrideA ==  StrideB, i.e. there's a common stride.
+  if (!CommonStride)
+    return Dependence::Unknown;
+
+  const APInt &Val = C->getAPInt();
+  int64_t Distance = Val.getSExtValue();
+
   // Bail out early if passed-in parameters make vectorization not feasible.
   unsigned ForcedFactor = (VectorizerParams::VectorizationFactor ?
                            VectorizerParams::VectorizationFactor : 1);
@@ -2108,7 +2149,7 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
   // the minimum distance needed is 28, which is greater than distance. It is
   // not safe to do vectorization.
   uint64_t MinDistanceNeeded =
-      TypeByteSize * Stride * (MinNumIter - 1) + TypeByteSize;
+      TypeByteSize * CommonStride * (MinNumIter - 1) + TypeByteSize;
   if (MinDistanceNeeded > static_cast<uint64_t>(Distance)) {
     LLVM_DEBUG(dbgs() << "LAA: Failure because of positive distance "
                       << Distance << '\n');
@@ -2157,7 +2198,7 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
 
   // An update to MinDepDistBytes requires an update to MaxSafeVectorWidthInBits
   // since there is a backwards dependency.
-  uint64_t MaxVF = MinDepDistBytes / (TypeByteSize * Stride);
+  uint64_t MaxVF = MinDepDistBytes / (TypeByteSize * CommonStride);
   LLVM_DEBUG(dbgs() << "LAA: Positive distance " << Val.getSExtValue()
                     << " with max VF = " << MaxVF << '\n');
   uint64_t MaxVFInBits = MaxVF * TypeByteSize * 8;
diff --git a/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp b/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
index edddfb1b92402f..059900f357e64b 100644
--- a/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
@@ -126,8 +126,10 @@ struct StoreToLoadForwardingCandidate {
 
     // We don't need to check non-wrapping here because forward/backward
     // dependence wouldn't be valid if these weren't monotonic accesses.
-    auto *Dist = cast<SCEVConstant>(
+    auto *Dist = dyn_cast<SCEVConstant>(
         PSE.getSE()->getMinusSCEV(StorePtrSCEV, LoadPtrSCEV));
+    if (!Dist)
+      return false;
     const APInt &Val = Dist->getAPInt();
     return Val == TypeByteSize * StrideLoad;
   }
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-backward.ll b/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-backward.ll
index 416742a94e0d36..8102611174c854 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-backward.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-backward.ll
@@ -45,15 +45,21 @@ exit:
 define void @different_non_constant_strides_known_backward_distance_larger_than_trip_count(ptr %A) {
 ; CHECK-LABEL: 'different_non_constant_strides_known_backward_distance_larger_than_trip_count'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT:            store i32 %add, ptr %gep.mul.2, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP1:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep.mul.2 = getelementptr inbounds i32, ptr %A.1024, i64 %iv.mul.2
+; CHECK-NEXT:        Against group ([[GRP2:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep = getelementptr inbounds i32, ptr %A, i64 %iv
 ; CHECK-NEXT:      Grouped accesses:
+; CHECK-NEXT:        Group [[GRP1]]:
+; CHECK-NEXT:          (Low: (1024 + %A)<nuw> High: (3068 + %A))
+; CHECK-NEXT:            Member: {(1024 + %A)<nuw>,+,8}<nuw><%loop>
+; CHECK-NEXT:        Group [[GRP2]]:
+; CHECK-NEXT:          (Low: %A High: (1024 + %A)<nuw>)
+; CHECK-NEXT:            Member: {%A,+,4}<nuw><%loop>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
@@ -83,15 +89,21 @@ exit:
 define void @different_non_constant_strides_known_backward_min_distance_16(ptr %A) {
 ; CHECK-LABEL: 'different_non_constant_strides_known_backward_min_distance_16'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT:            store i32 %add, ptr %gep.mul.2, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP3:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep.mul.2 = getelementptr inbounds i32, ptr %A.16, i64 %iv.mul.2
+; CHECK-NEXT:        Against group ([[GRP4:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep = getelementptr inbounds i32, ptr %A, i64 %iv
 ; CHECK-NEXT:      Grouped accesses:
+; CHECK-NEXT:        Group [[GRP3]]:
+; CHECK-NEXT:          (Low: (16 + %A)<nuw> High: (2060 + %A))
+; CHECK-NEXT:            Member: {(16 + %A)<nuw>,+,8}<nuw><%loop>
+; CHECK-NEXT:        Group [[GRP4]]:
+; CHECK-NEXT:          (Low: %A High: (1024 + %A))
+; CHECK-NEXT:            Member: {%A,+,4}<nuw><%loop>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
@@ -121,15 +133,21 @@ exit:
 define void @different_non_constant_strides_known_backward_min_distance_15(ptr %A) {
 ; CHECK-LABEL: 'different_non_constant_strides_known_backward_min_distance_15'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT:            store i32 %add, ptr %gep.mul.2, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP5:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep.mul.2 = getelementptr inbounds i32, ptr %A.15, i64 %iv.mul.2
+; CHECK-NEXT:        Against group ([[GRP6:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep = getelementptr inbounds i32, ptr %A, i64 %iv
 ; CHECK-NEXT:      Grouped accesses:
+; CHECK-NEXT:        Group [[GRP5]]:
+; CHECK-NEXT:          (Low: (15 + %A)<nuw> High: (2059 + %A))
+; CHECK-NEXT:            Member: {(15 + %A)<nuw>,+,8}<nuw><%loop>
+; CHECK-NEXT:        Group [[GRP6]]:
+; CHECK-NEXT:          (Low: %A High: (1024 + %A))
+; CHECK-NEXT:            Member: {%A,+,4}<nuw><%loop>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
@@ -159,15 +177,21 @@ exit:
 define void @different_non_constant_strides_known_backward_min_distance_8(ptr %A) {
 ; CHECK-LABEL: 'different_non_constant_strides_known_backward_min_distance_8'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT:            store i32 %add, ptr %gep.mul.2, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP7:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep.mul.2 = getelementptr inbounds i32, ptr %A.8, i64 %iv.mul.2
+; CHECK-NEXT:        Against group ([[GRP8:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep = getelementptr inbounds i32, ptr %A, i64 %iv
 ; CHECK-NEXT:      Grouped accesses:
+; CHECK-NEXT:        Group [[GRP7]]:
+; CHECK-NEXT:          (Low: (8 + %A)<nuw> High: (2052 + %A))
+; CHECK-NEXT:            Member: {(8 + %A)<nuw>,+,8}<nuw><%loop>
+; CHECK-NEXT:        Group [[GRP8]]:
+; CHECK-NEXT:          (Low: %A High: (1024 + %A))
+; CHECK-NEXT:            Member: {%A,+,4}<nuw><%loop>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
@@ -197,15 +221,21 @@ exit:
 define void @different_non_constant_strides_known_backward_min_distance_3(ptr %A) {
 ; CHECK-LABEL: 'different_non_constant_strides_known_backward_min_distance_3'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT:            store i32 %add, ptr %gep.mul.2, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP9:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep.mul.2 = getelementptr inbounds i32, ptr %A.3, i64 %iv.mul.2
+; CHECK-NEXT:        Against group ([[GRP10:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep = getelementptr inbounds i32, ptr %A, i64 %iv
 ; CHECK-NEXT:      Grouped accesses:
+; CHECK-NEXT:        Group [[GRP9]]:
+; CHECK-NEXT:          (Low: (3 + %A)<nuw> High: (2047 + %A))
+; CHECK-NEXT:            Member: {(3 + %A)<nuw>,+,8}<nuw><%loop>
+; CHECK-NEXT:        Group [[GRP10]]:
+; CHECK-NEXT:          (Low: %A High: (1024 + %A))
+; CHECK-NEXT:            Member: {%A,+,4}<nuw><%loop>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-forward.ll b/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-forward.ll
index aa22a2143352d2..7afbae2854374c 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-forward.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-forward.ll
@@ -8,10 +8,9 @@ declare void @llvm.assume(i1)
 define void @different_non_constant_strides_known_forward(ptr %A) {
 ; CHECK-LABEL: 'different_non_constant_strides_known_forward'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:        Forward:
 ; CHECK-NEXT:            %l = load i32, ptr %gep.mul.2, align 4 ->
 ; CHECK-NEXT:            store i32 %add, ptr %gep, align 4
 ; CHECK-EMPTY:
@@ -45,10 +44,9 @@ exit:
 define void @different_non_constant_strides_known_forward_min_distance_3(ptr %A) {
 ; CHECK-LABEL: 'different_non_constant_strides_known_forward_min_distance_3'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:        Forward:
 ; CHECK-NEXT:            %l = load i32, ptr %gep.mul.2, align 4 ->
 ; CHECK-NEXT:            store i32 %add, ptr %gep, align 4
 ; CHECK-EMPTY:
diff --git a/llvm/test/Transforms/LoopVectorize/global_alias.ll b/llvm/test/Transforms/LoopVectorize/global_alias.ll
index 336e462a4cf63c..84aee9ec62cf27 100644
--- a/llvm/test/Transforms/LoopVectorize/global_alias.ll
+++ b/llvm/test/Transforms/LoopVectorize/global_alias.ll
@@ -491,22 +491,101 @@ for.end:                                          ; preds = %for.body
   ret i32 %1
 }
 
+; /// Different objects, swapped induction, alias at the end
+; int noAlias15 (int a) {
+;   int i;
+;   for (i=0; i<SIZE; i++)
+;     Foo.A[i] = Foo.B[SIZE-i-1] + a;
+;   return Foo.A[a];
+; }
+; CHECK-LABEL: define i32 @noAlias15(
+; CHECK:      vector.memcheck:
+; CHECK-NEXT:    br i1 false, label %scalar.ph, label %vector.ph
+; CHECK: add nsw <4 x i32>
+; CHECK: ret
+
+define i32 @noAlias15(i32 %a) nounwind {
+entry:
+  br label %for...
[truncated]

Copy link

github-actions bot commented Apr 8, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just some drive-by comments)

fhahn added a commit that referenced this pull request Apr 10, 2024
As suggested in #88039, replace
the tuple with a struct, to make it easier to extend.
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would enjoy more textual description of what every condition is meant to check.

std::get<DepDistanceStrideAndSizeInfo>(Res);
bool HasSameSize = TypeByteSize > 0;

uint64_t CommonStride = StrideA == StrideB ? StrideA : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't 0 a valid stride (i.e. indexed by a constant)?

Suggested change
uint64_t CommonStride = StrideA == StrideB ? StrideA : 0;
std::optional<uint64_t> CommonStride = StrideA == StrideB ? StrideA : None;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, both strides need to be != 0 I think (getDependenceDistanceStrideAndSize returns Unknown dependence if any stride is == 0).

But using std::optional makes this future proof (I am planning to also remove that restriction as follow up), updated, thanks!

Comment on lines 2045 to 2047
if (HasSameSize && CommonStride &&
isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist,
Stride, TypeByteSize))
CommonStride, TypeByteSize))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSafeDependenceDistance doesn't depend on there being a common stride. It could either be the larger one, or a case-distinction of either Dist positive (Src < Sink) or positive (Sink < Src).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the comment in the function implies that it assumes equal strides which is why I left it as-is for now, to limit the scope of the initial patch.

It tries to prove |Dist| > BackedgeTakenCount * Stride, so using the maximum should conservatively be correct.

A similar reasoning can be applied at the end of the function too, which also is currently guarded by having a common stride. I am also planning on addressing that separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a TODO?

Comment on lines +129 to +132
auto *Dist = dyn_cast<SCEVConstant>(
PSE.getSE()->getMinusSCEV(StorePtrSCEV, LoadPtrSCEV));
if (!Dist)
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related? LAA doesn't seem to be involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the pass uses LoopAccessInfo in findStoreToLoadDependences to collect dependences. Before this patch, all identified dependences would have a constant distance which is why this change is needed

return Dependence::ForwardButPreventsForwarding;
if (IsTrueDataDependence && EnableForwardingConflictDetection) {
if (!C) {
FoundNonConstantDistanceDependence |= CommonStride;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no common stride (CommonStride == 0) then any of the two could still be non-constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the reason for only setting it for CommonStride == 0 is to only set it in cases where we would set it in the original code as well; if there was no common stride, we would have returned Unknown much earlier, without setting FoundNonConstantDistanceDependence.

I added a TODO to relax this requirement, but would like to keep it in the patch to limit the impact.

Comment on lines +2125 to +2126
if (!CommonStride)
return Dependence::Unknown;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many checks above also test (CommonStride > 1). Move them below this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could only find the one guarding areStridedAccessesIndependent and I don't think it can be moved here, as it tries to identify independent strided accesses early on to return NoDep. Kept the order roughly the same for now.

Comment on lines 2077 to 2081
if (!SE.isKnownNegative(Dist) && !HasSameSize) {
LLVM_DEBUG(dbgs() << "LAA: possibly zero dependence difference but "
"different type sizes\n");
return Dependence::Unknown;
}
Copy link
Member

@Meinersbur Meinersbur Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is that most of these return Unknown; should fall through instead to give other checks a chance to match. If there is an invariant after a certain point (e.g. "type size is equal") it should be rejected there. It avoids having to maintain several conditions e.g. updating FoundNonConstantDistanceDependence which seems to be missing here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged the code from above that handles C == 0 with the code here. The main reason we need to update FoundNonConstantDistanceDependence in multiple places now is that there now are multiple exit paths that return Unknown with non-constant dependences.

The latest version of the patch limits setting FoundNonConstantDistanceDependence to 3 times, with the one before couldPreventStoreLoadForward potentially being removed as a follow-up by making couldPreventStoreLoadForward work with non-constant distances.

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would enjoy more textual description of what every condition is meant to check.

There are multiple places that hand off reasoning to called functions, would you like to have a summary of what the function checks there? Could do as separate patch, as this would be independent of the current patch?

std::get<DepDistanceStrideAndSizeInfo>(Res);
bool HasSameSize = TypeByteSize > 0;

uint64_t CommonStride = StrideA == StrideB ? StrideA : 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, both strides need to be != 0 I think (getDependenceDistanceStrideAndSize returns Unknown dependence if any stride is == 0).

But using std::optional makes this future proof (I am planning to also remove that restriction as follow up), updated, thanks!

Comment on lines +129 to +132
auto *Dist = dyn_cast<SCEVConstant>(
PSE.getSE()->getMinusSCEV(StorePtrSCEV, LoadPtrSCEV));
if (!Dist)
return false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the pass uses LoopAccessInfo in findStoreToLoadDependences to collect dependences. Before this patch, all identified dependences would have a constant distance which is why this change is needed

Comment on lines +2125 to +2126
if (!CommonStride)
return Dependence::Unknown;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could only find the one guarding areStridedAccessesIndependent and I don't think it can be moved here, as it tries to identify independent strided accesses early on to return NoDep. Kept the order roughly the same for now.

return Dependence::ForwardButPreventsForwarding;
if (IsTrueDataDependence && EnableForwardingConflictDetection) {
if (!C) {
FoundNonConstantDistanceDependence |= CommonStride;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the reason for only setting it for CommonStride == 0 is to only set it in cases where we would set it in the original code as well; if there was no common stride, we would have returned Unknown much earlier, without setting FoundNonConstantDistanceDependence.

I added a TODO to relax this requirement, but would like to keep it in the patch to limit the impact.

Comment on lines 2077 to 2081
if (!SE.isKnownNegative(Dist) && !HasSameSize) {
LLVM_DEBUG(dbgs() << "LAA: possibly zero dependence difference but "
"different type sizes\n");
return Dependence::Unknown;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged the code from above that handles C == 0 with the code here. The main reason we need to update FoundNonConstantDistanceDependence in multiple places now is that there now are multiple exit paths that return Unknown with non-constant dependences.

The latest version of the patch limits setting FoundNonConstantDistanceDependence to 3 times, with the one before couldPreventStoreLoadForward potentially being removed as a follow-up by making couldPreventStoreLoadForward work with non-constant distances.

Comment on lines 2045 to 2047
if (HasSameSize && CommonStride &&
isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist,
Stride, TypeByteSize))
CommonStride, TypeByteSize))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the comment in the function implies that it assumes equal strides which is why I left it as-is for now, to limit the scope of the initial patch.

It tries to prove |Dist| > BackedgeTakenCount * Stride, so using the maximum should conservatively be correct.

A similar reasoning can be applied at the end of the function too, which also is currently guarded by having a common stride. I am also planning on addressing that separately.

@Meinersbur
Copy link
Member

I would enjoy more textual description of what every condition is meant to check.

There are multiple places that hand off reasoning to called functions, would you like to have a summary of what the function checks there? Could do as separate patch, as this would be independent of the current patch?

I was not familiar with this code, trying to reduce the impact of this patch doesn't help me to understand it and convince myself that it does not cause miscompilation, it rather makes it even more difficult since conditions are now all over the place.

For instance, the comment

// Attempt to prove strided accesses independent.

is useless. The entire function tries to prove access independency with a possible stride. Documentation for areStridedAccessesIndependent (itself not descriptive) is far away. A better comment could be

// If the offset and strides are known constants, check whether the accesses interlace each other.

fhahn added a commit that referenced this pull request Apr 19, 2024
As suggested in #88039, add
extra documentation for reasoning in isDependent.
@fhahn fhahn changed the base branch from main to users/fhahn/laa-doc-isdependent April 19, 2024 13:00
@fhahn
Copy link
Contributor Author

fhahn commented Apr 19, 2024

I would enjoy more textual description of what every condition is meant to check.

There are multiple places that hand off reasoning to called functions, would you like to have a summary of what the function checks there? Could do as separate patch, as this would be independent of the current patch?

I was not familiar with this code, trying to reduce the impact of this patch doesn't help me to understand it and convince myself that it does not cause miscompilation, it rather makes it even more difficult since conditions are now all over the place.

Thanks, I put up #89381 to add extra documentation and updated this PR to be based on #89381

ScalarEvolution &SE = *PSE.getSE();
auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout();
// If the distance between the acecsses is larger than their absolute stride
// multiplied by the backedge taken count, the accesses are independet, i.e.
// they are far enough appart that accesses won't access the same location
// across all loop ierations.
if (!isa<SCEVCouldNotCompute>(Dist) && HasSameSize &&
if (HasSameSize && CommonStride &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSafeDependenceDistance could be easily modified as well to support different sizes, but it's unrelated to this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I plan to address this separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up now available: #90036

}

// Negative distances are not plausible dependencies.
if (Val.isNegative()) {
if (SE.isKnownNonPositive(Dist)) {
if (!SE.isKnownNegative(Dist)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[serious] Not knowing something doesn't imply anything.

Suggested change
if (!SE.isKnownNegative(Dist)) {
// KnownNonPositive && KnownNonNegative => KnownZero
if (SE.isKnownNonNegative(Dist)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated!

Comment on lines 2115 to 2123
if (!C) {
// TODO: Relax requirement that there is a common stride to retry with
// non-constant distance dependencies.
FoundNonConstantDistanceDependence |= !!CommonStride;
LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bail-out moved here from line 2047. Is this done so return Dependence::Forward with negative distance can be returned also on non-constant distances?

FoundNonConstantDistanceDependence is not set if there is no common stride, but what does it have to do with the distance between the pointers? Before this patch it would set that flag independent of any stride.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is independent of whether there is a common stride or not, but previously getDependenceDistanceStrideAndSize would exit early with Unknown if the strides weren't matching, so FoundNonConstantDistanceDependence was only ever set if there was a common stride which is what the current patch tries to preserve in the initial version.

The current FoundNonConstantDistanceDependence can lead to runtime check generation in cases where the runtime checks will always fail unfortunately, which is why I'd like to keep its original behavior for now, as we regress codegen otherwise in some cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I like the arbitrary-looking !!CommonStride scattered around the code, but at least I understand now. FoundNonConstantDistanceDependence sounded like something that's safer to set to true when in doubt. Should really be called ReconsiderWithRuntimeChecks. If I may suggest an alternative TODO:

// TODO: FoundNonConstantDistanceDependence is used as a necessary condition to consider retrying
// with runtime checks. Historically, we did not set it when strides were different but there is no inherent
// reason to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, adjusted as suggested and replace !!CommonStride with CommonStride.has_value()

fhahn added a commit that referenced this pull request Apr 22, 2024
…9381)

As suggested in #88039, add
extra documentation for reasoning in isDependent.

PR: #89381
@fhahn fhahn deleted the branch llvm:main April 22, 2024 13:10
@fhahn fhahn closed this Apr 22, 2024
@fhahn fhahn reopened this Apr 22, 2024
@fhahn fhahn changed the base branch from users/fhahn/laa-doc-isdependent to main April 22, 2024 13:12
fhahn added a commit that referenced this pull request Apr 22, 2024
Extra tests with strides with different signs for
#88039.
@fhahn fhahn force-pushed the laa-nonconst-dist-forward branch from a4db999 to 7bec335 Compare April 22, 2024 15:48
Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest comments should be addressed/responded to, thanks!

I also added a missing check that the strides must have the same direction for now (extra tests added in 5138ccd). Without this check, we would at least try to vectorize with runtime checks in cases where the runtime checks would always be false.

}

// Negative distances are not plausible dependencies.
if (Val.isNegative()) {
if (SE.isKnownNonPositive(Dist)) {
if (!SE.isKnownNegative(Dist)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated!

ScalarEvolution &SE = *PSE.getSE();
auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout();
// If the distance between the acecsses is larger than their absolute stride
// multiplied by the backedge taken count, the accesses are independet, i.e.
// they are far enough appart that accesses won't access the same location
// across all loop ierations.
if (!isa<SCEVCouldNotCompute>(Dist) && HasSameSize &&
if (HasSameSize && CommonStride &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I plan to address this separately.

Comment on lines 2115 to 2123
if (!C) {
// TODO: Relax requirement that there is a common stride to retry with
// non-constant distance dependencies.
FoundNonConstantDistanceDependence |= !!CommonStride;
LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is independent of whether there is a common stride or not, but previously getDependenceDistanceStrideAndSize would exit early with Unknown if the strides weren't matching, so FoundNonConstantDistanceDependence was only ever set if there was a common stride which is what the current patch tries to preserve in the initial version.

The current FoundNonConstantDistanceDependence can lead to runtime check generation in cases where the runtime checks will always fail unfortunately, which is why I'd like to keep its original behavior for now, as we regress codegen otherwise in some cases.

fhahn added a commit that referenced this pull request Apr 25, 2024
Tests to add support for different strides with isSafeDependenceDistance
as follow-up to #88039.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Apr 25, 2024
fhahn added a commit to fhahn/llvm-project that referenced this pull request Apr 25, 2024
As discussed in llvm#88039, support
different strides with isSafeDependenceDistance by passing the maximum
of both strides.

isSafeDependenceDistance tries to prove that
    |Dist| > BackedgeTakenCount * Step
holds. Chosing the maximum stride computes the maximum range accesed by
the loop for all strides.
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining the code for me.

Comment on lines 2115 to 2123
if (!C) {
// TODO: Relax requirement that there is a common stride to retry with
// non-constant distance dependencies.
FoundNonConstantDistanceDependence |= !!CommonStride;
LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I like the arbitrary-looking !!CommonStride scattered around the code, but at least I understand now. FoundNonConstantDistanceDependence sounded like something that's safer to set to true when in doubt. Should really be called ReconsiderWithRuntimeChecks. If I may suggest an alternative TODO:

// TODO: FoundNonConstantDistanceDependence is used as a necessary condition to consider retrying
// with runtime checks. Historically, we did not set it when strides were different but there is no inherent
// reason to.

@fhahn fhahn merged commit 933f492 into llvm:main Apr 25, 2024
3 of 4 checks passed
@fhahn fhahn deleted the laa-nonconst-dist-forward branch April 25, 2024 20:38
fhahn added a commit that referenced this pull request Apr 30, 2024
As discussed in #88039, support
different strides with isSafeDependenceDistance by passing the maximum
of both strides.

isSafeDependenceDistance tries to prove that
    |Dist| > BackedgeTakenCount * Step
holds. Chosing the maximum stride computes the maximum range accesed by
the loop for all strides.

PR: #90036
@psteinfeld
Copy link
Contributor

@fhahn, after this change, we started failing one of our tests -- the Fortran Polyhedron benchmark "rnflow". When I build flang-new and compile the benchmark with the "flang-new -Ofast -march=native rnflow.f90", then run the benchmark, and then run the Polyhedron utility "pbvalid", you'll see failures. The failures only happen when both compilation options are used.

Do you have access to the Polyhedron "rnflow" benchmark?

@fhahn
Copy link
Contributor Author

fhahn commented May 1, 2024

@psteinfeld thanks for the heads up. Unfortunately I don't think I have access to that benchmark and I am also not really familiar with flang. Could you share the IR with and without the patch? It is likely that the patch enabled additional vectorization.

@vzakhari
Copy link
Contributor

vzakhari commented May 2, 2024

Regarding rnflow benchmark: yes, this change causes a loop vectorization with a memcheck. The loop contains an FP division, which is being rewritten with Newton-Raphson division approximation with RCP in ISel. The approximation uses muls/adds/subs, and ISel encodes them as FMAs. This altogether causes the accuracy change that affects the benchmark results.

Without this change, the loop is not vectorized and the scalar FP division is not selected into Newton-Raphson approximation. Gfortran does not vectorize the loop and also uses divss for the division.

It looks like the benchmark is sensitive to FP computations accuracy (that eventually affects even the integer results of the benchmark). Disabling FMA generation resolves the accuracy issue.

@psteinfeld
Copy link
Contributor

Newton-Raphson division approximation
Thanks, @vzakhari. I'll try to "fix" this by disabling one of the options that affect lower level transformations.

@fhahn, there's nothing for you to do here.

@fhahn
Copy link
Contributor Author

fhahn commented May 2, 2024

@vzakhari @psteinfeld good to know, thanks for checking!

fhahn added a commit to fhahn/llvm-project that referenced this pull request May 6, 2024
Update groupChecks to always use DepCands to try and merge runtime
checks. DepCands contains the dependency partition, grouping together
all accessed pointers to he same underlying objects.

If we computed the dependencies, We only need to check accesses to the
same underlying object, if there is an unknown dependency for this
underlying object; otherwise we already proved that all accesses withing
the underlying object are safe w.r.t. vectorization and we only need
to check that accesses to the underlying object don't overlap with
accesses to other underlying objects.

To ensure runtime checks are generated for the case with unknown
dependencies, remove equivalence classes containing accesses involved in
unknown dependencies.

This reduces the number of runtime checks needed in case non-constant
dependence distances are found, and is in preparation for removing the
restriction that the accesses need to have the same stride which was
added in llvm#88039.
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 7, 2024
Update groupChecks to always use DepCands to try and merge runtime
checks. DepCands contains the dependency partition, grouping together
all accessed pointers to he same underlying objects.

If we computed the dependencies, We only need to check accesses to the
same underlying object, if there is an unknown dependency for this
underlying object; otherwise we already proved that all accesses withing
the underlying object are safe w.r.t. vectorization and we only need
to check that accesses to the underlying object don't overlap with
accesses to other underlying objects.

To ensure runtime checks are generated for the case with unknown
dependencies, remove equivalence classes containing accesses involved in
unknown dependencies.

This reduces the number of runtime checks needed in case non-constant
dependence distances are found, and is in preparation for removing the
restriction that the accesses need to have the same stride which was
added in llvm#88039.
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 7, 2024
Update groupChecks to always use DepCands to try and merge runtime
checks. DepCands contains the dependency partition, grouping together
all accessed pointers to he same underlying objects.

If we computed the dependencies, We only need to check accesses to the
same underlying object, if there is an unknown dependency for this
underlying object; otherwise we already proved that all accesses withing
the underlying object are safe w.r.t. vectorization and we only need
to check that accesses to the underlying object don't overlap with
accesses to other underlying objects.

To ensure runtime checks are generated for the case with unknown
dependencies, remove equivalence classes containing accesses involved in
unknown dependencies.

This reduces the number of runtime checks needed in case non-constant
dependence distances are found, and is in preparation for removing the
restriction that the accesses need to have the same stride which was
added in llvm#88039.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Loop Vectorizer] Loop not vectorized when accessing the same pointer with different offsets.
6 participants