-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[LAA] Support different strides & non constant dep distances using SCEV. #88039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Extend LoopAccessAnalysis to support different strides and as a consequence non-constant distances between dependences using SCEV to reason about the direction of the dependence. In multiple places, logic to rule out dependences using the stride has been updated to only be used if StrideA == StrideB, i.e. there's a common stride. We now also may bail out at multiple places where we may have to set FoundNonConstantDistanceDependence. This is done when we need to bail out and the distance is not constant to preserve original behavior. I'd like to call out the changes in global_alias.ll in particular. In the modified mayAlias01, and mayAlias02 tests, there should be no aliasing in the original versions of the test, as they are accessing 2 different 100 element fields in a loop with 100 iterations. I moved the original tests to noAlias15 and noAlias16 respectively, while updating the original tests to use a variable trip count. In some cases, like different_non_constant_strides_known_backward_min_distance_3, we now vectorize with runtime checks, even though the runtime checks will always be false. I'll also share a follow-up patch, that also uses SCEV to more accurately identify backwards dependences with non-constant distances. Fixes llvm#87336
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesExtend LoopAccessAnalysis to support different strides and as a consequence non-constant distances between dependences using SCEV to reason about the direction of the dependence. In multiple places, logic to rule out dependences using the stride has been updated to only be used if StrideA == StrideB, i.e. there's a common stride. We now also may bail out at multiple places where we may have to set FoundNonConstantDistanceDependence. This is done when we need to bail out and the distance is not constant to preserve original behavior. I'd like to call out the changes in global_alias.ll in particular. In the modified mayAlias01, and mayAlias02 tests, there should be no aliasing in the original versions of the test, as they are accessing 2 different 100 element fields in a loop with 100 iterations. I moved the original tests to noAlias15 and noAlias16 respectively, while updating the original tests to use a variable trip count. In some cases, like different_non_constant_strides_known_backward_min_distance_3, we now vectorize with runtime checks, even though the runtime checks will always be false. I'll also share a follow-up patch, that also uses SCEV to more accurately identify backwards dependences with non-constant distances. Fixes #87336 Patch is 29.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88039.diff 6 Files Affected:
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index c25eede96a1859..314484e11c4a7c 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1923,8 +1923,9 @@ isLoopVariantIndirectAddress(ArrayRef<const Value *> UnderlyingObjects,
// of various temporary variables, like A/BPtr, StrideA/BPtr and others.
// Returns either the dependence result, if it could already be determined, or a
// tuple with (Distance, Stride, TypeSize, AIsWrite, BIsWrite).
-static std::variant<MemoryDepChecker::Dependence::DepType,
- std::tuple<const SCEV *, uint64_t, uint64_t, bool, bool>>
+static std::variant<
+ MemoryDepChecker::Dependence::DepType,
+ std::tuple<const SCEV *, uint64_t, uint64_t, uint64_t, bool, bool>>
getDependenceDistanceStrideAndSize(
const AccessAnalysis::MemAccessInfo &A, Instruction *AInst,
const AccessAnalysis::MemAccessInfo &B, Instruction *BInst,
@@ -1982,7 +1983,7 @@ getDependenceDistanceStrideAndSize(
// Need accesses with constant stride. We don't want to vectorize
// "A[B[i]] += ..." and similar code or pointer arithmetic that could wrap
// in the address space.
- if (!StrideAPtr || !StrideBPtr || StrideAPtr != StrideBPtr) {
+ if (!StrideAPtr || !StrideBPtr) {
LLVM_DEBUG(dbgs() << "Pointer access with non-constant stride\n");
return MemoryDepChecker::Dependence::Unknown;
}
@@ -1992,8 +1993,8 @@ getDependenceDistanceStrideAndSize(
DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy);
if (!HasSameSize)
TypeByteSize = 0;
- uint64_t Stride = std::abs(StrideAPtr);
- return std::make_tuple(Dist, Stride, TypeByteSize, AIsWrite, BIsWrite);
+ return std::make_tuple(Dist, std::abs(StrideAPtr), std::abs(StrideBPtr),
+ TypeByteSize, AIsWrite, BIsWrite);
}
MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
@@ -2011,68 +2012,108 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
if (std::holds_alternative<Dependence::DepType>(Res))
return std::get<Dependence::DepType>(Res);
- const auto &[Dist, Stride, TypeByteSize, AIsWrite, BIsWrite] =
- std::get<std::tuple<const SCEV *, uint64_t, uint64_t, bool, bool>>(Res);
+ const auto &[Dist, StrideA, StrideB, TypeByteSize, AIsWrite, BIsWrite] =
+ std::get<
+ std::tuple<const SCEV *, uint64_t, uint64_t, uint64_t, bool, bool>>(
+ Res);
bool HasSameSize = TypeByteSize > 0;
+ uint64_t CommonStride = StrideA == StrideB ? StrideA : 0;
+ if (isa<SCEVCouldNotCompute>(Dist)) {
+ FoundNonConstantDistanceDependence = true;
+ LLVM_DEBUG(dbgs() << "LAA: Dependence because of uncomputable distance.\n");
+ return Dependence::Unknown;
+ }
+
ScalarEvolution &SE = *PSE.getSE();
auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout();
- if (!isa<SCEVCouldNotCompute>(Dist) && HasSameSize &&
+ if (HasSameSize && CommonStride &&
isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist,
- Stride, TypeByteSize))
+ CommonStride, TypeByteSize))
return Dependence::NoDep;
const SCEVConstant *C = dyn_cast<SCEVConstant>(Dist);
- if (!C) {
- LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");
- FoundNonConstantDistanceDependence = true;
- return Dependence::Unknown;
- }
-
- const APInt &Val = C->getAPInt();
- int64_t Distance = Val.getSExtValue();
// Attempt to prove strided accesses independent.
- if (std::abs(Distance) > 0 && Stride > 1 && HasSameSize &&
- areStridedAccessesIndependent(std::abs(Distance), Stride, TypeByteSize)) {
- LLVM_DEBUG(dbgs() << "LAA: Strided accesses are independent\n");
- return Dependence::NoDep;
+ if (C) {
+ const APInt &Val = C->getAPInt();
+ int64_t Distance = Val.getSExtValue();
+
+ if (std::abs(Distance) > 0 && CommonStride > 1 && HasSameSize &&
+ areStridedAccessesIndependent(std::abs(Distance), CommonStride,
+ TypeByteSize)) {
+ LLVM_DEBUG(dbgs() << "LAA: Strided accesses are independent\n");
+ return Dependence::NoDep;
+ }
+ }
+
+ // Write to the same location with the same size.
+ if (SE.isKnownPredicate(CmpInst::ICMP_EQ, Dist,
+ SE.getZero(Dist->getType()))) {
+ if (HasSameSize)
+ return Dependence::Forward;
+ LLVM_DEBUG(
+ dbgs() << "LAA: Zero dependence difference but different type sizes\n");
+ return Dependence::Unknown;
}
// Negative distances are not plausible dependencies.
- if (Val.isNegative()) {
+ if (SE.isKnownNonPositive(Dist)) {
+ if (!SE.isKnownNegative(Dist) && !HasSameSize) {
+ LLVM_DEBUG(dbgs() << "LAA: possibly zero dependence difference but "
+ "different type sizes\n");
+ return Dependence::Unknown;
+ }
+
bool IsTrueDataDependence = (AIsWrite && !BIsWrite);
// There is no need to update MaxSafeVectorWidthInBits after call to
// couldPreventStoreLoadForward, even if it changed MinDepDistBytes,
// since a forward dependency will allow vectorization using any width.
- if (IsTrueDataDependence && EnableForwardingConflictDetection &&
- (!HasSameSize || couldPreventStoreLoadForward(Val.abs().getZExtValue(),
- TypeByteSize))) {
- LLVM_DEBUG(dbgs() << "LAA: Forward but may prevent st->ld forwarding\n");
- return Dependence::ForwardButPreventsForwarding;
+ if (IsTrueDataDependence && EnableForwardingConflictDetection) {
+ if (!C) {
+ FoundNonConstantDistanceDependence = true;
+ return Dependence::Unknown;
+ }
+ if (!HasSameSize ||
+ couldPreventStoreLoadForward(C->getAPInt().abs().getZExtValue(),
+ TypeByteSize)) {
+ LLVM_DEBUG(
+ dbgs() << "LAA: Forward but may prevent st->ld forwarding\n");
+ return Dependence::ForwardButPreventsForwarding;
+ }
}
LLVM_DEBUG(dbgs() << "LAA: Dependence is negative\n");
return Dependence::Forward;
}
- // Write to the same location with the same size.
- if (Val == 0) {
- if (HasSameSize)
- return Dependence::Forward;
- LLVM_DEBUG(
- dbgs() << "LAA: Zero dependence difference but different type sizes\n");
+ if (!SE.isKnownPositive(Dist)) {
+ if (!C)
+ FoundNonConstantDistanceDependence = true;
return Dependence::Unknown;
}
- assert(Val.isStrictlyPositive() && "Expect a positive value");
-
if (!HasSameSize) {
LLVM_DEBUG(dbgs() << "LAA: ReadWrite-Write positive dependency with "
"different type sizes\n");
+ if (!C)
+ FoundNonConstantDistanceDependence = true;
return Dependence::Unknown;
}
+ if (!C) {
+ LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");
+ FoundNonConstantDistanceDependence = true;
+ return Dependence::Unknown;
+ }
+
+ // The logic below currently only supports StrideA == StrideB, i.e. there's a common stride.
+ if (!CommonStride)
+ return Dependence::Unknown;
+
+ const APInt &Val = C->getAPInt();
+ int64_t Distance = Val.getSExtValue();
+
// Bail out early if passed-in parameters make vectorization not feasible.
unsigned ForcedFactor = (VectorizerParams::VectorizationFactor ?
VectorizerParams::VectorizationFactor : 1);
@@ -2108,7 +2149,7 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
// the minimum distance needed is 28, which is greater than distance. It is
// not safe to do vectorization.
uint64_t MinDistanceNeeded =
- TypeByteSize * Stride * (MinNumIter - 1) + TypeByteSize;
+ TypeByteSize * CommonStride * (MinNumIter - 1) + TypeByteSize;
if (MinDistanceNeeded > static_cast<uint64_t>(Distance)) {
LLVM_DEBUG(dbgs() << "LAA: Failure because of positive distance "
<< Distance << '\n');
@@ -2157,7 +2198,7 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
// An update to MinDepDistBytes requires an update to MaxSafeVectorWidthInBits
// since there is a backwards dependency.
- uint64_t MaxVF = MinDepDistBytes / (TypeByteSize * Stride);
+ uint64_t MaxVF = MinDepDistBytes / (TypeByteSize * CommonStride);
LLVM_DEBUG(dbgs() << "LAA: Positive distance " << Val.getSExtValue()
<< " with max VF = " << MaxVF << '\n');
uint64_t MaxVFInBits = MaxVF * TypeByteSize * 8;
diff --git a/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp b/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
index edddfb1b92402f..059900f357e64b 100644
--- a/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
@@ -126,8 +126,10 @@ struct StoreToLoadForwardingCandidate {
// We don't need to check non-wrapping here because forward/backward
// dependence wouldn't be valid if these weren't monotonic accesses.
- auto *Dist = cast<SCEVConstant>(
+ auto *Dist = dyn_cast<SCEVConstant>(
PSE.getSE()->getMinusSCEV(StorePtrSCEV, LoadPtrSCEV));
+ if (!Dist)
+ return false;
const APInt &Val = Dist->getAPInt();
return Val == TypeByteSize * StrideLoad;
}
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-backward.ll b/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-backward.ll
index 416742a94e0d36..8102611174c854 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-backward.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-backward.ll
@@ -45,15 +45,21 @@ exit:
define void @different_non_constant_strides_known_backward_distance_larger_than_trip_count(ptr %A) {
; CHECK-LABEL: 'different_non_constant_strides_known_backward_distance_larger_than_trip_count'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT: store i32 %add, ptr %gep.mul.2, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP1:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.mul.2 = getelementptr inbounds i32, ptr %A.1024, i64 %iv.mul.2
+; CHECK-NEXT: Against group ([[GRP2:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep = getelementptr inbounds i32, ptr %A, i64 %iv
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP1]]:
+; CHECK-NEXT: (Low: (1024 + %A)<nuw> High: (3068 + %A))
+; CHECK-NEXT: Member: {(1024 + %A)<nuw>,+,8}<nuw><%loop>
+; CHECK-NEXT: Group [[GRP2]]:
+; CHECK-NEXT: (Low: %A High: (1024 + %A)<nuw>)
+; CHECK-NEXT: Member: {%A,+,4}<nuw><%loop>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
@@ -83,15 +89,21 @@ exit:
define void @different_non_constant_strides_known_backward_min_distance_16(ptr %A) {
; CHECK-LABEL: 'different_non_constant_strides_known_backward_min_distance_16'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT: store i32 %add, ptr %gep.mul.2, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP3:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.mul.2 = getelementptr inbounds i32, ptr %A.16, i64 %iv.mul.2
+; CHECK-NEXT: Against group ([[GRP4:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep = getelementptr inbounds i32, ptr %A, i64 %iv
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP3]]:
+; CHECK-NEXT: (Low: (16 + %A)<nuw> High: (2060 + %A))
+; CHECK-NEXT: Member: {(16 + %A)<nuw>,+,8}<nuw><%loop>
+; CHECK-NEXT: Group [[GRP4]]:
+; CHECK-NEXT: (Low: %A High: (1024 + %A))
+; CHECK-NEXT: Member: {%A,+,4}<nuw><%loop>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
@@ -121,15 +133,21 @@ exit:
define void @different_non_constant_strides_known_backward_min_distance_15(ptr %A) {
; CHECK-LABEL: 'different_non_constant_strides_known_backward_min_distance_15'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT: store i32 %add, ptr %gep.mul.2, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP5:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.mul.2 = getelementptr inbounds i32, ptr %A.15, i64 %iv.mul.2
+; CHECK-NEXT: Against group ([[GRP6:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep = getelementptr inbounds i32, ptr %A, i64 %iv
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP5]]:
+; CHECK-NEXT: (Low: (15 + %A)<nuw> High: (2059 + %A))
+; CHECK-NEXT: Member: {(15 + %A)<nuw>,+,8}<nuw><%loop>
+; CHECK-NEXT: Group [[GRP6]]:
+; CHECK-NEXT: (Low: %A High: (1024 + %A))
+; CHECK-NEXT: Member: {%A,+,4}<nuw><%loop>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
@@ -159,15 +177,21 @@ exit:
define void @different_non_constant_strides_known_backward_min_distance_8(ptr %A) {
; CHECK-LABEL: 'different_non_constant_strides_known_backward_min_distance_8'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT: store i32 %add, ptr %gep.mul.2, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP7:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.mul.2 = getelementptr inbounds i32, ptr %A.8, i64 %iv.mul.2
+; CHECK-NEXT: Against group ([[GRP8:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep = getelementptr inbounds i32, ptr %A, i64 %iv
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP7]]:
+; CHECK-NEXT: (Low: (8 + %A)<nuw> High: (2052 + %A))
+; CHECK-NEXT: Member: {(8 + %A)<nuw>,+,8}<nuw><%loop>
+; CHECK-NEXT: Group [[GRP8]]:
+; CHECK-NEXT: (Low: %A High: (1024 + %A))
+; CHECK-NEXT: Member: {%A,+,4}<nuw><%loop>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
@@ -197,15 +221,21 @@ exit:
define void @different_non_constant_strides_known_backward_min_distance_3(ptr %A) {
; CHECK-LABEL: 'different_non_constant_strides_known_backward_min_distance_3'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT: store i32 %add, ptr %gep.mul.2, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP9:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.mul.2 = getelementptr inbounds i32, ptr %A.3, i64 %iv.mul.2
+; CHECK-NEXT: Against group ([[GRP10:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep = getelementptr inbounds i32, ptr %A, i64 %iv
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP9]]:
+; CHECK-NEXT: (Low: (3 + %A)<nuw> High: (2047 + %A))
+; CHECK-NEXT: Member: {(3 + %A)<nuw>,+,8}<nuw><%loop>
+; CHECK-NEXT: Group [[GRP10]]:
+; CHECK-NEXT: (Low: %A High: (1024 + %A))
+; CHECK-NEXT: Member: {%A,+,4}<nuw><%loop>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-forward.ll b/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-forward.ll
index aa22a2143352d2..7afbae2854374c 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-forward.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/non-constant-strides-forward.ll
@@ -8,10 +8,9 @@ declare void @llvm.assume(i1)
define void @different_non_constant_strides_known_forward(ptr %A) {
; CHECK-LABEL: 'different_non_constant_strides_known_forward'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
+; CHECK-NEXT: Forward:
; CHECK-NEXT: %l = load i32, ptr %gep.mul.2, align 4 ->
; CHECK-NEXT: store i32 %add, ptr %gep, align 4
; CHECK-EMPTY:
@@ -45,10 +44,9 @@ exit:
define void @different_non_constant_strides_known_forward_min_distance_3(ptr %A) {
; CHECK-LABEL: 'different_non_constant_strides_known_forward_min_distance_3'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
+; CHECK-NEXT: Forward:
; CHECK-NEXT: %l = load i32, ptr %gep.mul.2, align 4 ->
; CHECK-NEXT: store i32 %add, ptr %gep, align 4
; CHECK-EMPTY:
diff --git a/llvm/test/Transforms/LoopVectorize/global_alias.ll b/llvm/test/Transforms/LoopVectorize/global_alias.ll
index 336e462a4cf63c..84aee9ec62cf27 100644
--- a/llvm/test/Transforms/LoopVectorize/global_alias.ll
+++ b/llvm/test/Transforms/LoopVectorize/global_alias.ll
@@ -491,22 +491,101 @@ for.end: ; preds = %for.body
ret i32 %1
}
+; /// Different objects, swapped induction, alias at the end
+; int noAlias15 (int a) {
+; int i;
+; for (i=0; i<SIZE; i++)
+; Foo.A[i] = Foo.B[SIZE-i-1] + a;
+; return Foo.A[a];
+; }
+; CHECK-LABEL: define i32 @noAlias15(
+; CHECK: vector.memcheck:
+; CHECK-NEXT: br i1 false, label %scalar.ph, label %vector.ph
+; CHECK: add nsw <4 x i32>
+; CHECK: ret
+
+define i32 @noAlias15(i32 %a) nounwind {
+entry:
+ br label %for...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just some drive-by comments)
As suggested in #88039, replace the tuple with a struct, to make it easier to extend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would enjoy more textual description of what every condition is meant to check.
std::get<DepDistanceStrideAndSizeInfo>(Res); | ||
bool HasSameSize = TypeByteSize > 0; | ||
|
||
uint64_t CommonStride = StrideA == StrideB ? StrideA : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't 0
a valid stride (i.e. indexed by a constant)?
uint64_t CommonStride = StrideA == StrideB ? StrideA : 0; | |
std::optional<uint64_t> CommonStride = StrideA == StrideB ? StrideA : None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, both strides need to be != 0 I think (getDependenceDistanceStrideAndSize
returns Unknown
dependence if any stride is == 0).
But using std::optional makes this future proof (I am planning to also remove that restriction as follow up), updated, thanks!
if (HasSameSize && CommonStride && | ||
isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist, | ||
Stride, TypeByteSize)) | ||
CommonStride, TypeByteSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isSafeDependenceDistance
doesn't depend on there being a common stride. It could either be the larger one, or a case-distinction of either Dist positive (Src < Sink) or positive (Sink < Src).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment the comment in the function implies that it assumes equal strides which is why I left it as-is for now, to limit the scope of the initial patch.
It tries to prove |Dist| > BackedgeTakenCount * Stride
, so using the maximum should conservatively be correct.
A similar reasoning can be applied at the end of the function too, which also is currently guarded by having a common stride. I am also planning on addressing that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a TODO?
auto *Dist = dyn_cast<SCEVConstant>( | ||
PSE.getSE()->getMinusSCEV(StorePtrSCEV, LoadPtrSCEV)); | ||
if (!Dist) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related? LAA doesn't seem to be involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the pass uses LoopAccessInfo in findStoreToLoadDependences
to collect dependences. Before this patch, all identified dependences would have a constant distance which is why this change is needed
return Dependence::ForwardButPreventsForwarding; | ||
if (IsTrueDataDependence && EnableForwardingConflictDetection) { | ||
if (!C) { | ||
FoundNonConstantDistanceDependence |= CommonStride; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no common stride (CommonStride == 0
) then any of the two could still be non-constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the reason for only setting it for CommonStride == 0
is to only set it in cases where we would set it in the original code as well; if there was no common stride, we would have returned Unknown
much earlier, without setting FoundNonConstantDistanceDependence
.
I added a TODO to relax this requirement, but would like to keep it in the patch to limit the impact.
if (!CommonStride) | ||
return Dependence::Unknown; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many checks above also test (CommonStride > 1
). Move them below this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could only find the one guarding areStridedAccessesIndependent
and I don't think it can be moved here, as it tries to identify independent strided accesses early on to return NoDep. Kept the order roughly the same for now.
if (!SE.isKnownNegative(Dist) && !HasSameSize) { | ||
LLVM_DEBUG(dbgs() << "LAA: possibly zero dependence difference but " | ||
"different type sizes\n"); | ||
return Dependence::Unknown; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression is that most of these return Unknown;
should fall through instead to give other checks a chance to match. If there is an invariant after a certain point (e.g. "type size is equal") it should be rejected there. It avoids having to maintain several conditions e.g. updating FoundNonConstantDistanceDependence
which seems to be missing here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've merged the code from above that handles C == 0 with the code here. The main reason we need to update FoundNonConstantDistanceDependence
in multiple places now is that there now are multiple exit paths that return Unknown with non-constant dependences.
The latest version of the patch limits setting FoundNonConstantDistanceDependence
to 3 times, with the one before couldPreventStoreLoadForward
potentially being removed as a follow-up by making couldPreventStoreLoadForward
work with non-constant distances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would enjoy more textual description of what every condition is meant to check.
There are multiple places that hand off reasoning to called functions, would you like to have a summary of what the function checks there? Could do as separate patch, as this would be independent of the current patch?
std::get<DepDistanceStrideAndSizeInfo>(Res); | ||
bool HasSameSize = TypeByteSize > 0; | ||
|
||
uint64_t CommonStride = StrideA == StrideB ? StrideA : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, both strides need to be != 0 I think (getDependenceDistanceStrideAndSize
returns Unknown
dependence if any stride is == 0).
But using std::optional makes this future proof (I am planning to also remove that restriction as follow up), updated, thanks!
auto *Dist = dyn_cast<SCEVConstant>( | ||
PSE.getSE()->getMinusSCEV(StorePtrSCEV, LoadPtrSCEV)); | ||
if (!Dist) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the pass uses LoopAccessInfo in findStoreToLoadDependences
to collect dependences. Before this patch, all identified dependences would have a constant distance which is why this change is needed
if (!CommonStride) | ||
return Dependence::Unknown; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could only find the one guarding areStridedAccessesIndependent
and I don't think it can be moved here, as it tries to identify independent strided accesses early on to return NoDep. Kept the order roughly the same for now.
return Dependence::ForwardButPreventsForwarding; | ||
if (IsTrueDataDependence && EnableForwardingConflictDetection) { | ||
if (!C) { | ||
FoundNonConstantDistanceDependence |= CommonStride; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the reason for only setting it for CommonStride == 0
is to only set it in cases where we would set it in the original code as well; if there was no common stride, we would have returned Unknown
much earlier, without setting FoundNonConstantDistanceDependence
.
I added a TODO to relax this requirement, but would like to keep it in the patch to limit the impact.
if (!SE.isKnownNegative(Dist) && !HasSameSize) { | ||
LLVM_DEBUG(dbgs() << "LAA: possibly zero dependence difference but " | ||
"different type sizes\n"); | ||
return Dependence::Unknown; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've merged the code from above that handles C == 0 with the code here. The main reason we need to update FoundNonConstantDistanceDependence
in multiple places now is that there now are multiple exit paths that return Unknown with non-constant dependences.
The latest version of the patch limits setting FoundNonConstantDistanceDependence
to 3 times, with the one before couldPreventStoreLoadForward
potentially being removed as a follow-up by making couldPreventStoreLoadForward
work with non-constant distances.
if (HasSameSize && CommonStride && | ||
isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist, | ||
Stride, TypeByteSize)) | ||
CommonStride, TypeByteSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment the comment in the function implies that it assumes equal strides which is why I left it as-is for now, to limit the scope of the initial patch.
It tries to prove |Dist| > BackedgeTakenCount * Stride
, so using the maximum should conservatively be correct.
A similar reasoning can be applied at the end of the function too, which also is currently guarded by having a common stride. I am also planning on addressing that separately.
I was not familiar with this code, trying to reduce the impact of this patch doesn't help me to understand it and convince myself that it does not cause miscompilation, it rather makes it even more difficult since conditions are now all over the place. For instance, the comment
is useless. The entire function tries to prove access independency with a possible stride. Documentation for
|
As suggested in #88039, add extra documentation for reasoning in isDependent.
Thanks, I put up #89381 to add extra documentation and updated this PR to be based on #89381 |
ScalarEvolution &SE = *PSE.getSE(); | ||
auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout(); | ||
// If the distance between the acecsses is larger than their absolute stride | ||
// multiplied by the backedge taken count, the accesses are independet, i.e. | ||
// they are far enough appart that accesses won't access the same location | ||
// across all loop ierations. | ||
if (!isa<SCEVCouldNotCompute>(Dist) && HasSameSize && | ||
if (HasSameSize && CommonStride && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isSafeDependenceDistance
could be easily modified as well to support different sizes, but it's unrelated to this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I plan to address this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up now available: #90036
} | ||
|
||
// Negative distances are not plausible dependencies. | ||
if (Val.isNegative()) { | ||
if (SE.isKnownNonPositive(Dist)) { | ||
if (!SE.isKnownNegative(Dist)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[serious] Not knowing something doesn't imply anything.
if (!SE.isKnownNegative(Dist)) { | |
// KnownNonPositive && KnownNonNegative => KnownZero | |
if (SE.isKnownNonNegative(Dist)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated!
if (!C) { | ||
// TODO: Relax requirement that there is a common stride to retry with | ||
// non-constant distance dependencies. | ||
FoundNonConstantDistanceDependence |= !!CommonStride; | ||
LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bail-out moved here from line 2047. Is this done so return Dependence::Forward with negative distance can be returned also on non-constant distances?
FoundNonConstantDistanceDependence
is not set if there is no common stride, but what does it have to do with the distance between the pointers? Before this patch it would set that flag independent of any stride.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is independent of whether there is a common stride or not, but previously getDependenceDistanceStrideAndSize
would exit early with Unknown
if the strides weren't matching, so FoundNonConstantDistanceDependence
was only ever set if there was a common stride which is what the current patch tries to preserve in the initial version.
The current FoundNonConstantDistanceDependence
can lead to runtime check generation in cases where the runtime checks will always fail unfortunately, which is why I'd like to keep its original behavior for now, as we regress codegen otherwise in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I like the arbitrary-looking !!CommonStride
scattered around the code, but at least I understand now. FoundNonConstantDistanceDependence
sounded like something that's safer to set to true when in doubt. Should really be called ReconsiderWithRuntimeChecks
. If I may suggest an alternative TODO:
// TODO: FoundNonConstantDistanceDependence is used as a necessary condition to consider retrying
// with runtime checks. Historically, we did not set it when strides were different but there is no inherent
// reason to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, adjusted as suggested and replace !!CommonStride
with CommonStride.has_value()
Extra tests with strides with different signs for #88039.
a4db999
to
7bec335
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest comments should be addressed/responded to, thanks!
I also added a missing check that the strides must have the same direction for now (extra tests added in 5138ccd). Without this check, we would at least try to vectorize with runtime checks in cases where the runtime checks would always be false.
} | ||
|
||
// Negative distances are not plausible dependencies. | ||
if (Val.isNegative()) { | ||
if (SE.isKnownNonPositive(Dist)) { | ||
if (!SE.isKnownNegative(Dist)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated!
ScalarEvolution &SE = *PSE.getSE(); | ||
auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout(); | ||
// If the distance between the acecsses is larger than their absolute stride | ||
// multiplied by the backedge taken count, the accesses are independet, i.e. | ||
// they are far enough appart that accesses won't access the same location | ||
// across all loop ierations. | ||
if (!isa<SCEVCouldNotCompute>(Dist) && HasSameSize && | ||
if (HasSameSize && CommonStride && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I plan to address this separately.
if (!C) { | ||
// TODO: Relax requirement that there is a common stride to retry with | ||
// non-constant distance dependencies. | ||
FoundNonConstantDistanceDependence |= !!CommonStride; | ||
LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is independent of whether there is a common stride or not, but previously getDependenceDistanceStrideAndSize
would exit early with Unknown
if the strides weren't matching, so FoundNonConstantDistanceDependence
was only ever set if there was a common stride which is what the current patch tries to preserve in the initial version.
The current FoundNonConstantDistanceDependence
can lead to runtime check generation in cases where the runtime checks will always fail unfortunately, which is why I'd like to keep its original behavior for now, as we regress codegen otherwise in some cases.
Tests to add support for different strides with isSafeDependenceDistance as follow-up to #88039.
As discussed in llvm#88039, support different strides with isSafeDependenceDistance by passing the maximum of both strides. isSafeDependenceDistance tries to prove that |Dist| > BackedgeTakenCount * Step holds. Chosing the maximum stride computes the maximum range accesed by the loop for all strides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining the code for me.
if (!C) { | ||
// TODO: Relax requirement that there is a common stride to retry with | ||
// non-constant distance dependencies. | ||
FoundNonConstantDistanceDependence |= !!CommonStride; | ||
LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I like the arbitrary-looking !!CommonStride
scattered around the code, but at least I understand now. FoundNonConstantDistanceDependence
sounded like something that's safer to set to true when in doubt. Should really be called ReconsiderWithRuntimeChecks
. If I may suggest an alternative TODO:
// TODO: FoundNonConstantDistanceDependence is used as a necessary condition to consider retrying
// with runtime checks. Historically, we did not set it when strides were different but there is no inherent
// reason to.
As discussed in #88039, support different strides with isSafeDependenceDistance by passing the maximum of both strides. isSafeDependenceDistance tries to prove that |Dist| > BackedgeTakenCount * Step holds. Chosing the maximum stride computes the maximum range accesed by the loop for all strides. PR: #90036
@fhahn, after this change, we started failing one of our tests -- the Fortran Polyhedron benchmark "rnflow". When I build flang-new and compile the benchmark with the "flang-new -Ofast -march=native rnflow.f90", then run the benchmark, and then run the Polyhedron utility "pbvalid", you'll see failures. The failures only happen when both compilation options are used. Do you have access to the Polyhedron "rnflow" benchmark? |
@psteinfeld thanks for the heads up. Unfortunately I don't think I have access to that benchmark and I am also not really familiar with flang. Could you share the IR with and without the patch? It is likely that the patch enabled additional vectorization. |
Regarding Without this change, the loop is not vectorized and the scalar FP division is not selected into Newton-Raphson approximation. Gfortran does not vectorize the loop and also uses divss for the division. It looks like the benchmark is sensitive to FP computations accuracy (that eventually affects even the integer results of the benchmark). Disabling FMA generation resolves the accuracy issue. |
@vzakhari @psteinfeld good to know, thanks for checking! |
Update groupChecks to always use DepCands to try and merge runtime checks. DepCands contains the dependency partition, grouping together all accessed pointers to he same underlying objects. If we computed the dependencies, We only need to check accesses to the same underlying object, if there is an unknown dependency for this underlying object; otherwise we already proved that all accesses withing the underlying object are safe w.r.t. vectorization and we only need to check that accesses to the underlying object don't overlap with accesses to other underlying objects. To ensure runtime checks are generated for the case with unknown dependencies, remove equivalence classes containing accesses involved in unknown dependencies. This reduces the number of runtime checks needed in case non-constant dependence distances are found, and is in preparation for removing the restriction that the accesses need to have the same stride which was added in llvm#88039.
Update groupChecks to always use DepCands to try and merge runtime checks. DepCands contains the dependency partition, grouping together all accessed pointers to he same underlying objects. If we computed the dependencies, We only need to check accesses to the same underlying object, if there is an unknown dependency for this underlying object; otherwise we already proved that all accesses withing the underlying object are safe w.r.t. vectorization and we only need to check that accesses to the underlying object don't overlap with accesses to other underlying objects. To ensure runtime checks are generated for the case with unknown dependencies, remove equivalence classes containing accesses involved in unknown dependencies. This reduces the number of runtime checks needed in case non-constant dependence distances are found, and is in preparation for removing the restriction that the accesses need to have the same stride which was added in llvm#88039.
Update groupChecks to always use DepCands to try and merge runtime checks. DepCands contains the dependency partition, grouping together all accessed pointers to he same underlying objects. If we computed the dependencies, We only need to check accesses to the same underlying object, if there is an unknown dependency for this underlying object; otherwise we already proved that all accesses withing the underlying object are safe w.r.t. vectorization and we only need to check that accesses to the underlying object don't overlap with accesses to other underlying objects. To ensure runtime checks are generated for the case with unknown dependencies, remove equivalence classes containing accesses involved in unknown dependencies. This reduces the number of runtime checks needed in case non-constant dependence distances are found, and is in preparation for removing the restriction that the accesses need to have the same stride which was added in llvm#88039.
Extend LoopAccessAnalysis to support different strides and as a consequence non-constant distances between dependences using SCEV to reason about the direction of the dependence.
In multiple places, logic to rule out dependences using the stride has been updated to only be used if StrideA == StrideB, i.e. there's a common stride.
We now also may bail out at multiple places where we may have to set FoundNonConstantDistanceDependence. This is done when we need to bail out and the distance is not constant to preserve original behavior.
Fixes #87336