Skip to content

[SLP] Fix crash of shuffle poison #106857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 1, 2024
Merged

[SLP] Fix crash of shuffle poison #106857

merged 1 commit into from
Sep 1, 2024

Conversation

tcwzxx
Copy link
Member

@tcwzxx tcwzxx commented Aug 31, 2024

When the shuffle masks are PoisonMaskElem, there is not need to check the cost of SK_ExtractSubvector. It is free. Otherwise, it will cause the compiler to crash.

Assertion `(Idx + EltsPerVector) <= alignTo(NumElts, EltsPerVector) && "SK_ExtractSubvector index out of range"' failed.

@llvmbot
Copy link
Member

llvmbot commented Aug 31, 2024

@llvm/pr-subscribers-llvm-transforms

Author: tcwzxx (tcwzxx)

Changes

When the shuffle masks are PoisonMaskElem, there is not need to check the cost of SK_ExtractSubvector. It is free. Otherwise, it will cause the compiler to crash.

Assertion `(Idx + EltsPerVector) <= alignTo(NumElts, EltsPerVector) && "SK_ExtractSubvector index out of range"' failed.


Full diff: https://github.com/llvm/llvm-project/pull/106857.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+2-1)
  • (added) llvm/test/Transforms/SLPVectorizer/crash_extractelement_from_null.ll (+34)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 3d41c978281351..fa16f4632174de 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -8703,7 +8703,8 @@ class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {
     auto CheckPerRegistersShuffle = [&](MutableArrayRef<int> Mask,
                                         SmallVectorImpl<unsigned> &Indices)
         -> std::optional<TTI::ShuffleKind> {
-      if (NumElts <= EltsPerVector)
+      if (NumElts <= EltsPerVector ||
+          all_of(Mask, [](int I) { return I == PoisonMaskElem; }))
         return std::nullopt;
       int OffsetReg0 =
           alignDown(std::accumulate(Mask.begin(), Mask.end(), INT_MAX,
diff --git a/llvm/test/Transforms/SLPVectorizer/crash_extractelement_from_null.ll b/llvm/test/Transforms/SLPVectorizer/crash_extractelement_from_null.ll
new file mode 100644
index 00000000000000..87d92706bf36a4
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/crash_extractelement_from_null.ll
@@ -0,0 +1,34 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+
+; RUN: opt -S --passes=slp-vectorizer < %s | FileCheck %s
+
+define void @test(i8 %0, i8 %1) {
+; CHECK-LABEL: define void @test(
+; CHECK-SAME: i8 [[TMP0:%.*]], i8 [[TMP1:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[L:%.*]] = load <4 x i8>, ptr getelementptr (i8, ptr null, i32 8), align 1
+; CHECK-NEXT:    [[LI15:%.*]] = extractelement <4 x i8> [[L]], i64 15
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp ne i8 [[TMP0]], 0
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp ne i8 [[TMP1]], 0
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ne i8 [[TMP0]], 0
+; CHECK-NEXT:    [[DOTI15:%.*]] = icmp ne i8 [[LI15]], 0
+; CHECK-NEXT:    [[I0244:%.*]] = insertelement <4 x i1> zeroinitializer, i1 [[TMP2]], i64 0
+; CHECK-NEXT:    [[I1245:%.*]] = insertelement <4 x i1> [[I0244]], i1 [[TMP3]], i64 1
+; CHECK-NEXT:    [[I2246:%.*]] = insertelement <4 x i1> [[I1245]], i1 [[TMP4]], i64 2
+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <4 x i1> [[I2246]], i1 [[DOTI15]], i64 3
+; CHECK-NEXT:    ret void
+;
+entry:
+  %l = load <4 x i8>, ptr getelementptr (i8, ptr null, i32 8), align 1
+  %li15 = extractelement <4 x i8> %l, i64 15
+  %2 = icmp ne i8 %0, 0
+  %3 = icmp ne i8 %1, 0
+  %4 = icmp ne i8 %0, 0
+  %.i15 = icmp ne i8 %li15, 0
+
+  %i0244 = insertelement <4 x i1> zeroinitializer, i1 %2, i64 0
+  %i1245 = insertelement <4 x i1> %i0244, i1 %3, i64 1
+  %i2246 = insertelement <4 x i1> %i1245, i1 %4, i64 2
+  %14 = insertelement <4 x i1> %i2246, i1 %.i15, i64 3
+  ret void
+}

@alexey-bataev
Copy link
Member

alexey-bataev commented Aug 31, 2024

Better solution is just to exclude such extractelements completely so, that they do not go into computeExtractCost function at all

@tcwzxx
Copy link
Member Author

tcwzxx commented Aug 31, 2024

they do not go into computeExtractCost function at all

If NumParts > 1, some parts may be skipped while others cannot.

@alexey-bataev

This comment was marked as outdated.

Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks

@tcwzxx tcwzxx merged commit 24a043a into llvm:main Sep 1, 2024
5 of 8 checks passed
@tcwzxx tcwzxx deleted the slp branch September 1, 2024 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants