Skip to content

Conversation

@wangpc-pp
Copy link
Contributor

@wangpc-pp wangpc-pp commented Dec 1, 2025

According to the spec:

A vector register cannot be used to provide source operands with more
than one EEW for a single instruction. A mask register source is
considered to have EEW=1 for this constraint.

There must be a mask V0 in vmerge variants so the sources should
use register classes without V0.

This fixes #169905.

Co-authored-by: Luke Lau luke@igalia.com

@wangpc-pp wangpc-pp marked this pull request as ready for review December 1, 2025 06:09
@wangpc-pp wangpc-pp requested a review from topperc December 1, 2025 06:10
@llvmbot
Copy link
Member

llvmbot commented Dec 1, 2025

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-globalisel

Author: Pengcheng Wang (wangpc-pp)

Changes

According to the spec:

> A vector register cannot be used to provide source operands with more
> than one EEW for a single instruction. A mask register source is
> considered to have EEW=1 for this constraint.

There must be a mask V0 in vmerge variants so the sources should
use register classes without V0.

This fixes #169905.


Patch is 303.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170070.diff

33 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td (+15-15)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/select.mir (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll (+164-82)
  • (modified) llvm/test/CodeGen/RISCV/rvv/copyprop.mir (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+27-28)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll (+411-411)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll (+286-148)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-select-addsub.ll (+25-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int-interleave.ll (+26-53)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vselect-vp.ll (+25-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-sdnode.ll (+82-179)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fminimum-sdnode.ll (+82-179)
  • (modified) llvm/test/CodeGen/RISCV/rvv/mask-reg-alloc.mir (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/pr88576.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll (+11-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-splice.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfmacc-vp.ll (+87-57)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfmsac-vp.ll (+87-57)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfnmacc-vp.ll (+87-57)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfnmsac-vp.ll (+87-57)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vl-optimizer-subreg-assert.mir (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmacc-vp.ll (+96-62)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll (+96-53)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmerge-peephole.mir (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmerge.ll (+3-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmv.s.x.ll (+5-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.mir (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vnmsac-vp.ll (+96-62)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vpmerge-sdnode.ll (+4-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vselect-fp.ll (+25-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vselect-int.ll (+3-2)
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
index eb3c9b0defccb..e36204c536c0d 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
@@ -2982,21 +2982,21 @@ multiclass VPseudoVFWALU_WV_WF_RM {
 multiclass VPseudoVMRG_VM_XM_IM {
   foreach m = MxList in {
     defvar mx = m.MX;
-    def "_VVM" # "_" # m.MX:
-      VPseudoTiedBinaryCarryIn<GetVRegNoV0<m.vrclass>.R,
-                               m.vrclass, m.vrclass, m>,
-      SchedBinary<"WriteVIMergeV", "ReadVIMergeV", "ReadVIMergeV", mx,
-                          forcePassthruRead=true>;
-    def "_VXM" # "_" # m.MX:
-      VPseudoTiedBinaryCarryIn<GetVRegNoV0<m.vrclass>.R,
-                               m.vrclass, GPR, m>,
-      SchedBinary<"WriteVIMergeX", "ReadVIMergeV", "ReadVIMergeX", mx,
-                          forcePassthruRead=true>;
-    def "_VIM" # "_" # m.MX:
-      VPseudoTiedBinaryCarryIn<GetVRegNoV0<m.vrclass>.R,
-                               m.vrclass, simm5, m>,
-      SchedUnary<"WriteVIMergeI", "ReadVIMergeV", mx,
-                          forcePassthruRead=true>;
+    def "_VVM"#"_"#m.MX : VPseudoTiedBinaryCarryIn<GetVRegNoV0<m.vrclass>.R,
+                                                   GetVRegNoV0<m.vrclass>.R,
+                                                   GetVRegNoV0<m.vrclass>.R, m>,
+        SchedBinary<"WriteVIMergeV", "ReadVIMergeV", "ReadVIMergeV", mx,
+                    forcePassthruRead = true>;
+    def "_VXM"#"_"#m.MX
+        : VPseudoTiedBinaryCarryIn<GetVRegNoV0<m.vrclass>.R,
+                                   GetVRegNoV0<m.vrclass>.R, GPR, m>,
+        SchedBinary<"WriteVIMergeX", "ReadVIMergeV", "ReadVIMergeX", mx,
+                    forcePassthruRead = true>;
+    def "_VIM"#"_"#m.MX
+        : VPseudoTiedBinaryCarryIn<GetVRegNoV0<m.vrclass>.R,
+                                   GetVRegNoV0<m.vrclass>.R, simm5, m>,
+        SchedUnary<"WriteVIMergeI", "ReadVIMergeV", mx,
+                   forcePassthruRead = true>;
   }
 }
 
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/select.mir b/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/select.mir
index f8061462c6220..ada76a43639d7 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/select.mir
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/select.mir
@@ -11,7 +11,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv1i8
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vr = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_MF4_:%[0-9]+]]:vrnov0 = PseudoVMERGE_VVM_MF4 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 3 /* e8 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMERGE_VVM_MF4_]]
@@ -19,7 +19,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv1i8
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vr = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_MF4_:%[0-9]+]]:vrnov0 = PseudoVMERGE_VVM_MF4 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 3 /* e8 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMERGE_VVM_MF4_]]
@@ -40,7 +40,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv4i8
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vr = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_M1_:%[0-9]+]]:vrnov0 = PseudoVMERGE_VVM_M1 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 3 /* e8 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMERGE_VVM_M1_]]
@@ -48,7 +48,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv4i8
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vr = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_M1_:%[0-9]+]]:vrnov0 = PseudoVMERGE_VVM_M1 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 3 /* e8 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMERGE_VVM_M1_]]
@@ -69,7 +69,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv16i8
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm4 = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm4nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrm4nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_M4_:%[0-9]+]]:vrm4nov0 = PseudoVMERGE_VVM_M4 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 3 /* e8 */
     ; RV32I-NEXT: $v8m4 = COPY [[PseudoVMERGE_VVM_M4_]]
@@ -77,7 +77,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv16i8
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm4 = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm4nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrm4nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_M4_:%[0-9]+]]:vrm4nov0 = PseudoVMERGE_VVM_M4 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 3 /* e8 */
     ; RV64I-NEXT: $v8m4 = COPY [[PseudoVMERGE_VVM_M4_]]
@@ -98,7 +98,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv64i8
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vr = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_MF4_:%[0-9]+]]:vrnov0 = PseudoVMERGE_VVM_MF4 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 4 /* e16 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMERGE_VVM_MF4_]]
@@ -106,7 +106,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv64i8
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vr = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_MF4_:%[0-9]+]]:vrnov0 = PseudoVMERGE_VVM_MF4 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 4 /* e16 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMERGE_VVM_MF4_]]
@@ -127,7 +127,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv2i16
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vr = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_M1_:%[0-9]+]]:vrnov0 = PseudoVMERGE_VVM_M1 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 4 /* e16 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMERGE_VVM_M1_]]
@@ -135,7 +135,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv2i16
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vr = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_M1_:%[0-9]+]]:vrnov0 = PseudoVMERGE_VVM_M1 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 4 /* e16 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMERGE_VVM_M1_]]
@@ -156,7 +156,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv8i16
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm4 = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm4nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrm4nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_M4_:%[0-9]+]]:vrm4nov0 = PseudoVMERGE_VVM_M4 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 4 /* e16 */
     ; RV32I-NEXT: $v8m4 = COPY [[PseudoVMERGE_VVM_M4_]]
@@ -164,7 +164,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv8i16
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm4 = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm4nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrm4nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_M4_:%[0-9]+]]:vrm4nov0 = PseudoVMERGE_VVM_M4 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 4 /* e16 */
     ; RV64I-NEXT: $v8m4 = COPY [[PseudoVMERGE_VVM_M4_]]
@@ -185,7 +185,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv32i16
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vr = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_MF2_:%[0-9]+]]:vrnov0 = PseudoVMERGE_VVM_MF2 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 5 /* e32 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMERGE_VVM_MF2_]]
@@ -193,7 +193,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv32i16
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vr = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrnov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_MF2_:%[0-9]+]]:vrnov0 = PseudoVMERGE_VVM_MF2 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 5 /* e32 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMERGE_VVM_MF2_]]
@@ -214,7 +214,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv2i32
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm2 = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm2nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrm2nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_M2_:%[0-9]+]]:vrm2nov0 = PseudoVMERGE_VVM_M2 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 5 /* e32 */
     ; RV32I-NEXT: $v8m2 = COPY [[PseudoVMERGE_VVM_M2_]]
@@ -222,7 +222,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv2i32
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm2 = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm2nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrm2nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_M2_:%[0-9]+]]:vrm2nov0 = PseudoVMERGE_VVM_M2 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 5 /* e32 */
     ; RV64I-NEXT: $v8m2 = COPY [[PseudoVMERGE_VVM_M2_]]
@@ -243,7 +243,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv8i32
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm8 = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm8nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrm8nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_M8_:%[0-9]+]]:vrm8nov0 = PseudoVMERGE_VVM_M8 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 5 /* e32 */
     ; RV32I-NEXT: $v8m8 = COPY [[PseudoVMERGE_VVM_M8_]]
@@ -251,7 +251,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv8i32
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm8 = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm8nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrm8nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_M8_:%[0-9]+]]:vrm8nov0 = PseudoVMERGE_VVM_M8 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 5 /* e32 */
     ; RV64I-NEXT: $v8m8 = COPY [[PseudoVMERGE_VVM_M8_]]
@@ -272,7 +272,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv1i64
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm2 = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm2nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrm2nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_M2_:%[0-9]+]]:vrm2nov0 = PseudoVMERGE_VVM_M2 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 6 /* e64 */
     ; RV32I-NEXT: $v8m2 = COPY [[PseudoVMERGE_VVM_M2_]]
@@ -280,7 +280,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv1i64
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm2 = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm2nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrm2nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_M2_:%[0-9]+]]:vrm2nov0 = PseudoVMERGE_VVM_M2 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 6 /* e64 */
     ; RV64I-NEXT: $v8m2 = COPY [[PseudoVMERGE_VVM_M2_]]
@@ -301,7 +301,7 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: select_nxv4i64
     ; RV32I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm8 = IMPLICIT_DEF
+    ; RV32I-NEXT: [[DEF1:%[0-9]+]]:vrm8nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[DEF2:%[0-9]+]]:vrm8nov0 = IMPLICIT_DEF
     ; RV32I-NEXT: [[PseudoVMERGE_VVM_M8_:%[0-9]+]]:vrm8nov0 = PseudoVMERGE_VVM_M8 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 6 /* e64 */
     ; RV32I-NEXT: $v8m8 = COPY [[PseudoVMERGE_VVM_M8_]]
@@ -309,7 +309,7 @@ body:             |
     ;
     ; RV64I-LABEL: name: select_nxv4i64
     ; RV64I: [[DEF:%[0-9]+]]:vmv0 = IMPLICIT_DEF
-    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm8 = IMPLICIT_DEF
+    ; RV64I-NEXT: [[DEF1:%[0-9]+]]:vrm8nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[DEF2:%[0-9]+]]:vrm8nov0 = IMPLICIT_DEF
     ; RV64I-NEXT: [[PseudoVMERGE_VVM_M8_:%[0-9]+]]:vrm8nov0 = PseudoVMERGE_VVM_M8 [[DEF2]], [[DEF1]], [[DEF1]], [[DEF]], -1, 6 /* e64 */
     ; RV64I-NEXT: $v8m8 = COPY [[PseudoVMERGE_VVM_M8_]]
diff --git a/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll b/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll
index 2d4fce68f9545..27b53befbf4a7 100644
--- a/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll
@@ -288,54 +288,89 @@ define i32 @test_nxv128i1(<vscale x 128 x i1> %x) {
 ; CHECK-NEXT:    .cfi_def_cfa_offset 16
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
+; CHECK-NEXT:    mv a1, a0
+; CHECK-NEXT:    slli a0, a0, 1
+; CHECK-NEXT:    add a0, a0, a1
 ; CHECK-NEXT:    sub sp, sp, a0
-; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb
+; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x18, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 24 * vlenb
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
 ; CHECK-NEXT:    vmv1r.v v7, v8
 ; CHECK-NEXT:    vmv1r.v v6, v0
 ; CHECK-NEXT:    vmv.v.i v16, 0
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
-; CHECK-NEXT:    addi a1, sp, 16
+; CHECK-NEXT:    csrr a1, vlenb
+; CHECK-NEXT:    slli a1, a1, 4
+; CHECK-NEXT:    add a1, sp, a1
+; CHECK-NEXT:    addi a1, a1, 16
 ; CHECK-NEXT:    vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
 ; CHECK-NEXT:    srli a1, a0, 1
 ; CHECK-NEXT:    vsetvli a2, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vslidedown.vx v0, v0, a1
 ; CHECK-NEXT:    srli a0, a0, 2
-; CHECK-NEXT:    vmv8r.v v8, v16
 ; CHECK-NEXT:    vsetvli a2, zero, e32, m8, ta, ma
-; CHECK-NEXT:    vmerge.vim v24, v16, 1, v0
+; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
+; CHECK-NEXT:    csrr a2, vlenb
+; CHECK-NEXT:    slli a2, a2, 3
+; CHECK-NEXT:    add a2, sp, a2
+; CHECK-NEXT:    addi a2, a2, 16
+; CHECK-NEXT:    vs8r.v v8, (a2) # vscale x 64-byte Folded Spill
 ; CHECK-NEXT:    vsetvli a2, zero, e8, mf2, ta, ma
 ; CHECK-NEXT:    vslidedown.vx v0, v0, a0
 ; CHECK-NEXT:    vsetvli a2, zero, e32, m8, ta, ma
-; CHECK-NEXT:    vmerge.vim v16, v16, 1, v0
+; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
 ; CHECK-NEXT:    vsetvli a2, zero, e8, mf2, ta, ma
 ; CHECK-NEXT:    vslidedown.vx v0, v6, a0
+; CHECK-NEXT:    vsetvli a2, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vmerge.vim v16, v16, 1, v0
+; CHECK-NEXT:    vsetvli a2, zero, e8, mf2, ta, ma
+; CHECK-NEXT:    vslidedown.vx v0, v7, a0
+; CHECK-NEXT:    vsetvli a2, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vadd.vi v24, v16, 1
+; CHECK-NEXT:    vmerge.vvm v16, v16, v24, v0
 ; CHECK-NEXT:    vsetvli a2, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vslidedown.vx v6, v7, a1
-; CHECK-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
-; CHECK-NEXT:    vmerge.vim v8, v8, 1, v0
 ; CHECK-NEXT:    vsetvli a1, zero, e8, mf2, ta, ma
-; CHECK-NEXT:    vslidedown.vx v0, v7, a0
-; CHECK-NEXT:    vslidedown.vx v5, v6, a0
-; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, mu
-; CHECK-NEXT:    vadd.vi v8, v8, 1, v0.t
-; CHECK-NEXT:    vmv1r.v v0, v5
-; CHECK-NEXT:    vadd.vi v16, v16, 1, v0.t
-; CHECK-NEXT:    vadd.vv v8, v8, v16
+; CHECK-NEXT:    vslidedown.vx v0, v6, a0
+; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vadd.vi v24, v8, 1
+; CHECK-NEXT:    vmerge.vvm v8, v8, v24, v0
+; CHECK-NEXT:    vadd.vv v8, v16, v8
+; CHECK-NEXT:    addi a0, sp, 16
+; CHECK-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
+; CHECK-NEXT:    csrr a0, vlenb
+; CHECK-NEXT:    slli a0, a0, 3
+; CHECK-NEXT:    add a0, sp, a0
+; CHECK-NEXT:    addi a0, a0, 16
+; CHECK-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
+; CHECK-NEXT:    vadd.vi v16, v8, 1
 ; CHECK-NEXT:    vmv1r.v v0, v6
-; CHECK-NEXT:    vadd.vi v24, v24, 1, v0.t
+; CHECK-NEXT:    vmerge.vvm v16, v8, v16, v0
+; CHECK-NEXT:    csrr a0, vlenb
+; CHECK-NEXT:    slli a0, a0, 4
+; CHECK-NEXT:    add a0, sp, a0
+; CHECK-NEXT:    addi a0, a0, 16
+; CHECK-NEXT:    vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
+; CHECK-NEXT:    vadd.vi v24, v24, 1
 ; CHECK-NEXT:    vmv1r.v v0, v7
+; CHECK-NEXT:    csrr a0, vlenb
+; CHECK-NEXT:    slli a0, a0, 4
+; CHECK-NEXT:    add a0, sp, a0
+; CHECK-NEXT:    addi a0, a0, 16
+; CHECK-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
+; CHECK-NEXT:    vmerge.vvm v24, v8, v24, v0
+; CHECK-NEXT:    vadd.vv v16, v24, v16
 ; CHECK-NEXT:    addi a0, sp, 16
-; CHECK-NEXT:    vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
-; CHECK-NEXT:    vadd.vi v16, v16, 1, v0.t
-; CHECK-NEXT:    vadd.vv v16, v16, v24
+; CHECK-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
 ; CHECK-NEXT:    vadd.vv v8, v16, v8
 ; CHECK-NEXT:    vmv.s.x v16, zero
 ; CHECK-NEXT:    vredsum.vs v8, v8, v16
 ; CHECK-NEXT:    vmv.x.s a0, v8
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    slli a1, a1, 3
+; CHECK-NEXT:    mv a2, a1
+; CHECK-NEXT:    slli a1, a1, 1
+; CHECK-NEXT:    add a1, a1, a2
 ; CHECK-NEXT:    add sp, sp, a1
 ; CHECK-NEXT:    .cfi_def_cfa sp, 16
 ; CHECK-NEXT:    addi sp, sp, 16
@@ -353,12 +388,14 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    addi sp, sp, -16
 ; CHECK-NEXT:    .cfi_def_cfa_offset 16
 ; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 4
+; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    mv a1, a0
 ; CHECK-NEXT:    slli a0, a0, 1
+; CHECK-NEXT:    add a1, a1, a0
+; CHECK-NEXT:    slli a0, a0, 1
 ; CHECK-NEXT:    add a0, a0, a1
 ; CHECK-NEXT:    sub sp, sp, a0
-; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x30, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 48 * vlenb
+; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x38, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 56 * vlenb
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
 ; CHECK-NEXT:    vmv1r.v v6, v10
 ; CHECK-NEXT:    vmv1r.v v7, v9
@@ -368,9 +405,9 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
 ; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 3
+; CHECK-NEXT:    slli a0, a0, 4
 ; CHECK-NEXT:    mv a2, a0
-; CHECK-NEXT:    slli a0, a0, 2
+; CHECK-NEXT:    slli a0, a0, 1
 ; CHECK-NEXT:    add a0, a0, a2
 ; CHECK-NEXT:    add a0, sp, a0
 ; CHECK-NEXT:    addi a0, a0, 16
@@ -378,7 +415,10 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    vmv1r.v v0, v5
 ; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
 ; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 5
+; CHECK-NEXT:    slli a0, a0, 3
+; CHECK-NEXT:    mv a2, a0
+; CHECK-NEXT:    slli a0, a0, 2
+; CHECK-NEXT:    add a0, a0, a2
 ; CHECK-NEXT:    add a0, sp, a0
 ; CHECK-NEXT:    addi a0, a0, 16
 ; CHECK-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
@@ -391,127 +431,169 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    vsetvli...
[truncated]

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Dec 1, 2025
In llvm#170070, PseudoVMERGE_V* instructions will have copies to NoV0 reg classes in their operands. In order to continue folding them we need to look through these copies.

We previously looked through copies when comparing if the false and passthru operands were equivalent, but didn't look through copies for the true operand.

This looks through the copies up front for all operands, and not just when we're comparing equality.
; RV32-NEXT: mul a2, a2, a3
; RV32-NEXT: sub sp, sp, a2
; RV32-NEXT: .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0xe4, 0x00, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 100 * vlenb
; RV32-NEXT: .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0xd4, 0x00, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 84 * vlenb
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this change reduces some spills/reloads?

; CHECK-NEXT: add a1, a1, a2
; CHECK-NEXT: sub sp, sp, a1
; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x18, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 24 * vlenb
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a regression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high LMUL test cases are usually very noisy because they depend on the register allocator, I think we can ignore these. Hopefully nothing internal to LLVM will ever generate a type this large.

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But you should probably review the RISCVVectorPeephole.cpp changes too :)

@wangpc-pp wangpc-pp requested a review from preames December 1, 2025 07:43
@wangpc-pp
Copy link
Contributor Author

I will wait for one more day and merge it on Tuesday.

According to the spec:

> A vector register cannot be used to provide source operands with more
> than one EEW for a single instruction. A mask register source is
> considered to have EEW=1 for this constraint.

There must be a mask `V0` in `vmerge` variants so the sources should
use register classes without `V0`.

This fixes llvm#169905.
@wangpc-pp wangpc-pp force-pushed the main-riscv-rvv-vmerge-overlap branch from ccb65dc to 712b77d Compare December 2, 2025 12:04
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangpc-pp wangpc-pp merged commit 76cb984 into llvm:main Dec 3, 2025
10 checks passed
@wangpc-pp wangpc-pp deleted the main-riscv-rvv-vmerge-overlap branch December 3, 2025 02:55
wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this pull request Dec 3, 2025
Or we can't pass the MachineVerifier because of using a killed virtual
register.

This was found when backporting llvm#170070 to 21.x branch.
wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this pull request Dec 3, 2025
Or we can't pass the MachineVerifier because of using a killed virtual
register.

This was found when backporting llvm#170070 to 21.x branch.
@asb
Copy link
Contributor

asb commented Dec 3, 2025

This commit is breaking all of the RVV buildbots, and #170438 doesn't fix it.

Here is a minimised testcase:

; ModuleID = 'reduced.ll'
source_filename = "reduced.ll"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"

define void @widget(ptr %arg, <vscale x 2 x i1> %arg1, <vscale x 2 x i64> %arg2, ptr %arg3, i1 %arg4, <vscale x 2 x i1> %arg5) {
bb:
  br label %bb6

bb6:                                              ; preds = %bb6, %bb
  %call = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr null, <vscale x 2 x i1> %arg1, i32 1)
  %or = or <vscale x 2 x i64> %call, insertelement (<vscale x 2 x i64> poison, i64 1, i64 0)
  %icmp = icmp eq <vscale x 2 x i64> %call, zeroinitializer
  %select = select <vscale x 2 x i1> %icmp, <vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> %arg2
  %xor = xor <vscale x 2 x i64> %or, %select
  call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> %xor, ptr null, <vscale x 2 x i1> %arg1, i32 1)
  br i1 %arg4, label %bb7, label %bb6

bb7:                                              ; preds = %bb7, %bb6
  %load = load <2 x i64>, ptr %arg, align 8
  %icmp8 = icmp eq <2 x i64> %load, zeroinitializer
  %select9 = select <2 x i1> %icmp8, <2 x i64> zeroinitializer, <2 x i64> splat (i64 2567483615)
  %xor10 = xor <2 x i64> %load, %select9
  store <2 x i64> %xor10, ptr %arg3, align 8
  br label %bb7
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)
declare <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr captures(none), <vscale x 2 x i1>, i32) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: write)
declare void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64>, ptr captures(none), <vscale x 2 x i1>, i32) #1

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: read) }
attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: write) }

Which gives:

./tc.baseline/bin/llc -O3 -mtriple=riscv64-linux-gnu -mattr=+rva23u64 < reduced2.ll 
....
...
*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function:    widget
- basic block: %bb.3 bb7 (0x5aa4182ab110)
Virtual register %25 is used after the block.

*** Bad machine code: Virtual register defs don't dominate all uses. ***
- function:    widget
- v. register: %14

*** Bad machine code: Virtual register defs don't dominate all uses. ***
- function:    widget
- v. register: %25
LLVM ERROR: Found 3 machine code errors.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
Stack dump:
0.	Program arguments: ./tc.baseline/bin/llc -O3 -mtriple=riscv64-linux-gnu -mattr=+rva23u64
1.	Running pass 'Function Pass Manager' on module '<stdin>'.
2.	Running pass 'Live Interval Analysis' on function '@widget'
 #0 0x00005aa3f20a7096 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./tc.baseline/bin/llc+0x3fea096)
 #1 0x00005aa3f20a45c5 llvm::sys::RunSignalHandlers() (./tc.baseline/bin/llc+0x3fe75c5)
 #2 0x00005aa3f20a7ea4 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x0000711505c3e4d0 (/usr/lib/libc.so.6+0x3e4d0)
 #4 0x0000711505c9890c (/usr/lib/libc.so.6+0x9890c)
 #5 0x0000711505c3e3a0 raise (/usr/lib/libc.so.6+0x3e3a0)
 #6 0x0000711505c2557a abort (/usr/lib/libc.so.6+0x2557a)
 #7 0x00005aa3f2003cd5 llvm::report_fatal_error(llvm::Twine const&, bool) (./tc.baseline/bin/llc+0x3f46cd5)
 #8 0x00005aa3f110e6a9 (./tc.baseline/bin/llc+0x30516a9)
 #9 0x00005aa3f110edad llvm::MachineFunction::verify(llvm::Pass*, char const*, llvm::raw_ostream*, bool) const (./tc.baseline/bin/llc+0x3051dad)
#10 0x00005aa3f0f1500b llvm::LiveRangeCalc::findReachingDefs(llvm::LiveRange&, llvm::MachineBasicBlock&, llvm::SlotIndex, llvm::Register, llvm::ArrayRef<llvm::SlotIndex>) (./tc.baseline/bin/llc+0x2e5800b)
#11 0x00005aa3f0f14129 llvm::LiveRangeCalc::extend(llvm::LiveRange&, llvm::SlotIndex, llvm::Register, llvm::ArrayRef<llvm::SlotIndex>) (./tc.baseline/bin/llc+0x2e57129)
#12 0x00005aa3f0f17fda llvm::LiveIntervalCalc::extendToUses(llvm::LiveRange&, llvm::Register, llvm::LaneBitmask, llvm::LiveInterval*) (./tc.baseline/bin/llc+0x2e5afda)
#13 0x00005aa3f0f17c8c llvm::LiveIntervalCalc::calculate(llvm::LiveInterval&, bool) (./tc.baseline/bin/llc+0x2e5ac8c)
...

@wangpc-pp
Copy link
Contributor Author

This commit is breaking all of the RVV buildbots, and #170438 doesn't fix it.

Here is a minimised testcase:

; ModuleID = 'reduced.ll'
source_filename = "reduced.ll"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"

define void @widget(ptr %arg, <vscale x 2 x i1> %arg1, <vscale x 2 x i64> %arg2, ptr %arg3, i1 %arg4, <vscale x 2 x i1> %arg5) {
bb:
  br label %bb6

bb6:                                              ; preds = %bb6, %bb
  %call = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr null, <vscale x 2 x i1> %arg1, i32 1)
  %or = or <vscale x 2 x i64> %call, insertelement (<vscale x 2 x i64> poison, i64 1, i64 0)
  %icmp = icmp eq <vscale x 2 x i64> %call, zeroinitializer
  %select = select <vscale x 2 x i1> %icmp, <vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> %arg2
  %xor = xor <vscale x 2 x i64> %or, %select
  call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> %xor, ptr null, <vscale x 2 x i1> %arg1, i32 1)
  br i1 %arg4, label %bb7, label %bb6

bb7:                                              ; preds = %bb7, %bb6
  %load = load <2 x i64>, ptr %arg, align 8
  %icmp8 = icmp eq <2 x i64> %load, zeroinitializer
  %select9 = select <2 x i1> %icmp8, <2 x i64> zeroinitializer, <2 x i64> splat (i64 2567483615)
  %xor10 = xor <2 x i64> %load, %select9
  store <2 x i64> %xor10, ptr %arg3, align 8
  br label %bb7
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)
declare <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr captures(none), <vscale x 2 x i1>, i32) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: write)
declare void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64>, ptr captures(none), <vscale x 2 x i1>, i32) #1

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: read) }
attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: write) }

Which gives:

./tc.baseline/bin/llc -O3 -mtriple=riscv64-linux-gnu -mattr=+rva23u64 < reduced2.ll 
....
...
*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function:    widget
- basic block: %bb.3 bb7 (0x5aa4182ab110)
Virtual register %25 is used after the block.

*** Bad machine code: Virtual register defs don't dominate all uses. ***
- function:    widget
- v. register: %14

*** Bad machine code: Virtual register defs don't dominate all uses. ***
- function:    widget
- v. register: %25
LLVM ERROR: Found 3 machine code errors.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
Stack dump:
0.	Program arguments: ./tc.baseline/bin/llc -O3 -mtriple=riscv64-linux-gnu -mattr=+rva23u64
1.	Running pass 'Function Pass Manager' on module '<stdin>'.
2.	Running pass 'Live Interval Analysis' on function '@widget'
 #0 0x00005aa3f20a7096 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./tc.baseline/bin/llc+0x3fea096)
 #1 0x00005aa3f20a45c5 llvm::sys::RunSignalHandlers() (./tc.baseline/bin/llc+0x3fe75c5)
 #2 0x00005aa3f20a7ea4 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x0000711505c3e4d0 (/usr/lib/libc.so.6+0x3e4d0)
 #4 0x0000711505c9890c (/usr/lib/libc.so.6+0x9890c)
 #5 0x0000711505c3e3a0 raise (/usr/lib/libc.so.6+0x3e3a0)
 #6 0x0000711505c2557a abort (/usr/lib/libc.so.6+0x2557a)
 #7 0x00005aa3f2003cd5 llvm::report_fatal_error(llvm::Twine const&, bool) (./tc.baseline/bin/llc+0x3f46cd5)
 #8 0x00005aa3f110e6a9 (./tc.baseline/bin/llc+0x30516a9)
 #9 0x00005aa3f110edad llvm::MachineFunction::verify(llvm::Pass*, char const*, llvm::raw_ostream*, bool) const (./tc.baseline/bin/llc+0x3051dad)
#10 0x00005aa3f0f1500b llvm::LiveRangeCalc::findReachingDefs(llvm::LiveRange&, llvm::MachineBasicBlock&, llvm::SlotIndex, llvm::Register, llvm::ArrayRef<llvm::SlotIndex>) (./tc.baseline/bin/llc+0x2e5800b)
#11 0x00005aa3f0f14129 llvm::LiveRangeCalc::extend(llvm::LiveRange&, llvm::SlotIndex, llvm::Register, llvm::ArrayRef<llvm::SlotIndex>) (./tc.baseline/bin/llc+0x2e57129)
#12 0x00005aa3f0f17fda llvm::LiveIntervalCalc::extendToUses(llvm::LiveRange&, llvm::Register, llvm::LaneBitmask, llvm::LiveInterval*) (./tc.baseline/bin/llc+0x2e5afda)
#13 0x00005aa3f0f17c8c llvm::LiveIntervalCalc::calculate(llvm::LiveInterval&, bool) (./tc.baseline/bin/llc+0x2e5ac8c)
...

It should be fixed after applying @lukel97's suggestion about restricting lookThruCopies to only look through COPYs with one non-debug use.

@lukel97
Copy link
Contributor

lukel97 commented Dec 3, 2025

@wangpc-pp Should we revert and reapply this later with the fixes in #170438 included? I'm not sure what's causing the Virtual register defs don't dominate all uses

@lukel97
Copy link
Contributor

lukel97 commented Dec 3, 2025

It should be fixed after applying @lukel97's suggestion about restricting lookThruCopies to only look through COPYs with one non-debug use.

Oh if it's fixed with the latest version of #170438 then maybe we should just go ahead and fix forward. Can you confirm if the latest version fixes it @asb?

@asb
Copy link
Contributor

asb commented Dec 3, 2025

I'd tested with the latest version of #170438 at the time, 38ad8f6. Trying again with the latest push I can no longer reproduce the issue on my testcase (just rerunning the test suite build now to check for other issues) so getting that landed seems like a good path forwards.

@wangpc-pp
Copy link
Contributor Author

Besides, why didn't I receive any notification from RVV buildbot failures? @asb Has it fully integrated?

wangpc-pp added a commit that referenced this pull request Dec 3, 2025
There are two fixes:

1. Clear kill flags for `FalseReg` in foldVMergeToMask or we can't
pass the MachineVerifier because of using a killed virtual register.
2. Restrict `lookThruCopies` to only look through COPYs with
one non-debug use.

This was found when backporting #170070 to 21.x branch.
@asb
Copy link
Contributor

asb commented Dec 3, 2025

Besides, why didn't I receive any notification from RVV buildbot failures? @asb Has it fully integrated?

It has, you should have had an email from each bot that failed (I've checked with mudltiple other people this is working). What we don't get is a notification on the PR itself, because the integration is setup so that's only done when there's a single candidate commit. Our builders aren't fast enough for that (except maybe the 'gauntlet' builder - it may be worth me switching that over to building every commit even if it means we build a small queue sometimes).

In general these email notifications often get missed because there are so many flaky builders, or instances of a failure that's caused by another commit that was tested alongside yours.

Are you seeing nothing at all in your email?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RISCV] Wrong register allocation for vmerge.vvm

5 participants