Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Check isFixedLengthVector before calling getVectorNumElements in getSingleShuffleSrc. #125455

Closed
wants to merge 2 commits into from

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Feb 3, 2025

I have been unsuccessful at further reducing the test. The failure requires a shuffle with 2 scalable->fixed extracts with the same source. 0 is the only valid index for a scalable->fixed extract so the 2 sources must be the same extract. Shuffles with the same source are aggressively canonicalized to a unary shuffle. So it requires the extracts to become identical through other optimizations without the shuffle being canonicalized before it is lowered.

Fixes #125306.

… in getSingleShuffleSrc.

I have been unsuccessful at further reducing the test. The
failure requires a shuffle with 2 scalable->fixed extracts with
the same source. 0 is the only valid index for a scalable->fixed
extract so the 2 sources must be the same extract. Shuffles with
the same source are aggressively canonicalized to a unary shuffle.
So it requires the extracts to become identical through other
optimizations without the shuffle being canonicalized before it is
lowered.

Fixes llvm#125306.
@llvmbot
Copy link
Member

llvmbot commented Feb 3, 2025

@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-tablegen

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

I have been unsuccessful at further reducing the test. The failure requires a shuffle with 2 scalable->fixed extracts with the same source. 0 is the only valid index for a scalable->fixed extract so the 2 sources must be the same extract. Shuffles with the same source are aggressively canonicalized to a unary shuffle. So it requires the extracts to become identical through other optimizations without the shuffle being canonicalized before it is lowered.

Fixes #125306.


Full diff: https://github.com/llvm/llvm-project/pull/125455.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+2-1)
  • (added) llvm/test/CodeGen/RISCV/rvv/pr125306.ll (+110)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 07e3390f3fbb22f..8e3caf51d876b9e 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -4510,7 +4510,8 @@ static SDValue getSingleShuffleSrc(MVT VT, MVT ContainerVT, SDValue V1,
 
   // Src needs to have twice the number of elements.
   unsigned NumElts = VT.getVectorNumElements();
-  if (Src.getValueType().getVectorNumElements() != (NumElts * 2))
+  if (!Src.getValueType().isFixedLengthVector() ||
+      Src.getValueType().getVectorNumElements() != (NumElts * 2))
     return SDValue();
 
   // The extracts must extract the two halves of the source.
diff --git a/llvm/test/CodeGen/RISCV/rvv/pr125306.ll b/llvm/test/CodeGen/RISCV/rvv/pr125306.ll
new file mode 100644
index 000000000000000..b341d15634d55a2
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/pr125306.ll
@@ -0,0 +1,110 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v | FileCheck %s
+
+define <2 x i32> @main(ptr %0) {
+; CHECK-LABEL: main:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vmv.v.i v8, 0
+; CHECK-NEXT:    vse32.v v8, (zero)
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vmv.v.i v8, 0
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vmv.v.i v10, 0
+; CHECK-NEXT:    li a2, 64
+; CHECK-NEXT:    sw zero, 80(zero)
+; CHECK-NEXT:    lui a1, 7
+; CHECK-NEXT:    lui a3, 1
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vid.v v11
+; CHECK-NEXT:    li a4, 16
+; CHECK-NEXT:    lui a5, 2
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vse32.v v10, (a2)
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vmv.v.i v10, 0
+; CHECK-NEXT:    li a2, 24
+; CHECK-NEXT:    sh zero, -392(a3)
+; CHECK-NEXT:    sh zero, 534(a3)
+; CHECK-NEXT:    sh zero, 1460(a3)
+; CHECK-NEXT:    li a3, 32
+; CHECK-NEXT:    vse32.v v10, (a2)
+; CHECK-NEXT:    li a2, 40
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vse32.v v8, (a0)
+; CHECK-NEXT:    sh zero, -1710(a5)
+; CHECK-NEXT:    sh zero, -784(a5)
+; CHECK-NEXT:    sh zero, 142(a5)
+; CHECK-NEXT:    lw a5, -304(a1)
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vadd.vi v9, v11, -1
+; CHECK-NEXT:    vse32.v v10, (a3)
+; CHECK-NEXT:    sh zero, 0(a0)
+; CHECK-NEXT:    lw a0, -188(a1)
+; CHECK-NEXT:    vse32.v v10, (a2)
+; CHECK-NEXT:    lw a2, -188(a1)
+; CHECK-NEXT:    lw a3, 1244(a1)
+; CHECK-NEXT:    vmv.v.x v8, a0
+; CHECK-NEXT:    lw a0, 1244(a1)
+; CHECK-NEXT:    lw a1, -304(a1)
+; CHECK-NEXT:    vmv.v.x v10, a3
+; CHECK-NEXT:    vmv.v.x v11, a5
+; CHECK-NEXT:    vslide1down.vx v8, v8, zero
+; CHECK-NEXT:    vslide1down.vx v10, v10, zero
+; CHECK-NEXT:    vmin.vv v8, v10, v8
+; CHECK-NEXT:    vmv.v.x v10, a0
+; CHECK-NEXT:    vslide1down.vx v11, v11, zero
+; CHECK-NEXT:    vmin.vx v10, v10, a2
+; CHECK-NEXT:    vmin.vx v10, v10, a1
+; CHECK-NEXT:    vmin.vv v11, v8, v11
+; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vand.vv v9, v11, v9
+; CHECK-NEXT:    vslideup.vi v8, v10, 1
+; CHECK-NEXT:    vse32.v v9, (a4)
+; CHECK-NEXT:    sh zero, 0(zero)
+; CHECK-NEXT:    ret
+entry:
+  store <16 x i32> zeroinitializer, ptr null, align 4
+  store <8 x i32> zeroinitializer, ptr %0, align 4
+  store <4 x i32> zeroinitializer, ptr getelementptr inbounds nuw (i8, ptr null, i64 64), align 4
+  store i32 0, ptr getelementptr inbounds nuw (i8, ptr null, i64 80), align 4
+  %1 = load i32, ptr getelementptr inbounds nuw (i8, ptr null, i64 29916), align 4
+  %broadcast.splatinsert53 = insertelement <4 x i32> zeroinitializer, i32 %1, i64 0
+  %2 = load i32, ptr getelementptr inbounds nuw (i8, ptr null, i64 28484), align 4
+  %broadcast.splatinsert55 = insertelement <4 x i32> zeroinitializer, i32 %2, i64 0
+  %3 = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %broadcast.splatinsert53, <4 x i32> %broadcast.splatinsert55)
+  %4 = load i32, ptr getelementptr inbounds nuw (i8, ptr null, i64 28368), align 4
+  %broadcast.splatinsert57 = insertelement <4 x i32> zeroinitializer, i32 %4, i64 0
+  %5 = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %3, <4 x i32> %broadcast.splatinsert57)
+  store i16 0, ptr getelementptr inbounds nuw (i8, ptr null, i64 3704), align 2
+  store i16 0, ptr getelementptr inbounds nuw (i8, ptr null, i64 4630), align 2
+  %6 = shufflevector <4 x i32> %5, <4 x i32> zeroinitializer, <2 x i32> <i32 0, i32 4>
+  store <2 x i32> %6, ptr getelementptr inbounds nuw (i8, ptr null, i64 16), align 4
+  store i16 0, ptr getelementptr inbounds nuw (i8, ptr null, i64 5556), align 2
+  store i16 0, ptr getelementptr inbounds nuw (i8, ptr null, i64 6482), align 2
+  store <2 x i32> zeroinitializer, ptr getelementptr inbounds nuw (i8, ptr null, i64 24), align 4
+  store i16 0, ptr getelementptr inbounds nuw (i8, ptr null, i64 7408), align 2
+  store i16 0, ptr getelementptr inbounds nuw (i8, ptr null, i64 8334), align 2
+  store <2 x i32> zeroinitializer, ptr getelementptr inbounds nuw (i8, ptr null, i64 32), align 4
+  store i16 0, ptr %0, align 2
+  store <2 x i32> zeroinitializer, ptr getelementptr inbounds nuw (i8, ptr null, i64 40), align 4
+  %7 = load i32, ptr getelementptr inbounds nuw (i8, ptr null, i64 29916), align 4
+  %broadcast.splatinsert165 = insertelement <4 x i32> poison, i32 %7, i64 0
+  %8 = load i32, ptr getelementptr inbounds nuw (i8, ptr null, i64 28484), align 4
+  %broadcast.splatinsert167 = insertelement <4 x i32> poison, i32 %8, i64 0
+  %9 = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %broadcast.splatinsert165, <4 x i32> %broadcast.splatinsert167)
+  %10 = load i32, ptr getelementptr inbounds nuw (i8, ptr null, i64 28368), align 4
+  %broadcast.splatinsert169 = insertelement <4 x i32> poison, i32 %10, i64 0
+  %11 = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %9, <4 x i32> %broadcast.splatinsert169)
+  store i16 0, ptr null, align 2
+  %12 = load i32, ptr getelementptr inbounds nuw (i8, ptr null, i64 29916), align 4
+  %broadcast.splatinsert179 = insertelement <4 x i32> poison, i32 %12, i64 0
+  %13 = load i32, ptr getelementptr inbounds nuw (i8, ptr null, i64 28484), align 4
+  %broadcast.splatinsert181 = insertelement <4 x i32> poison, i32 %13, i64 0
+  %14 = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %broadcast.splatinsert179, <4 x i32> %broadcast.splatinsert181)
+  %15 = load i32, ptr getelementptr inbounds nuw (i8, ptr null, i64 28368), align 4
+  %broadcast.splatinsert183 = insertelement <4 x i32> poison, i32 %15, i64 0
+  %16 = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %14, <4 x i32> %broadcast.splatinsert183)
+  %17 = shufflevector <4 x i32> %11, <4 x i32> %16, <2 x i32> <i32 0, i32 4>
+  ret <2 x i32> %17
+}

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Can you add a comment to the test file describing the case being exercised? Your explanation from the review description would make a good starting point.

@llvmbot llvmbot added tablegen llvm:SelectionDAG SelectionDAGISel as well labels Feb 3, 2025
topperc added a commit that referenced this pull request Feb 3, 2025
… in getSingleShuffleSrc. (#125455)

I have been unsuccessful at further reducing the test. The
failure requires a shuffle with 2 scalable->fixed extracts with
the same source. 0 is the only valid index for a scalable->fixed
extract so the 2 sources must be the same extract. Shuffles with
the same source are aggressively canonicalized to a unary shuffle.
So it requires the extracts to become identical through other
optimizations without the shuffle being canonicalized before it is
lowered.

Fixes #125306.
@topperc
Copy link
Collaborator Author

topperc commented Feb 3, 2025

Committed as 7c5100d

@topperc topperc closed this Feb 3, 2025
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Feb 10, 2025
… in getSingleShuffleSrc. (llvm#125455)

I have been unsuccessful at further reducing the test. The
failure requires a shuffle with 2 scalable->fixed extracts with
the same source. 0 is the only valid index for a scalable->fixed
extract so the 2 sources must be the same extract. Shuffles with
the same source are aggressively canonicalized to a unary shuffle.
So it requires the extracts to become identical through other
optimizations without the shuffle being canonicalized before it is
lowered.

Fixes llvm#125306.

(cherry picked from commit 7c5100d)
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
… in getSingleShuffleSrc. (llvm#125455)

I have been unsuccessful at further reducing the test. The
failure requires a shuffle with 2 scalable->fixed extracts with
the same source. 0 is the only valid index for a scalable->fixed
extract so the 2 sources must be the same extract. Shuffles with
the same source are aggressively canonicalized to a unary shuffle.
So it requires the extracts to become identical through other
optimizations without the shuffle being canonicalized before it is
lowered.

Fixes llvm#125306.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RISC-V] LLVM ERROR: Invalid size request on a scalable vector
4 participants