-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[PHIElimination] Reuse existing COPY in predecessor basic block #131837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-hexagon Author: Guy David (guy-david) ChangesThe insertion point of COPY isn't always optimal and could lead to a worse block layout, see the regression test in the first commit (which needs to be reduced). Patch is 2.30 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131837.diff 127 Files Affected:
diff --git a/llvm/lib/CodeGen/PHIElimination.cpp b/llvm/lib/CodeGen/PHIElimination.cpp
index 14f91a87f75b4..cc3d4aac55b9d 100644
--- a/llvm/lib/CodeGen/PHIElimination.cpp
+++ b/llvm/lib/CodeGen/PHIElimination.cpp
@@ -587,6 +587,15 @@ void PHIEliminationImpl::LowerPHINode(MachineBasicBlock &MBB,
MachineBasicBlock::iterator InsertPos =
findPHICopyInsertPoint(&opBlock, &MBB, SrcReg);
+ // Reuse an existing copy in the block if possible.
+ if (MachineInstr *DefMI = MRI->getUniqueVRegDef(SrcReg)) {
+ if (DefMI->isCopy() && DefMI->getParent() == &opBlock &&
+ MRI->use_empty(SrcReg)) {
+ DefMI->getOperand(0).setReg(IncomingReg);
+ continue;
+ }
+ }
+
// Insert the copy.
MachineInstr *NewSrcInstr = nullptr;
if (!reusedIncoming && IncomingReg) {
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
index c1c5c53aa7df2..6c300b04508b2 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
@@ -118,8 +118,8 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
; -O0: bl __aarch64_cas16_relax
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -131,8 +131,8 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
; -O0: bl __aarch64_cas16_relax
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -144,8 +144,8 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
; -O0: bl __aarch64_cas16_rel
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -157,8 +157,8 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
; -O0: bl __aarch64_cas16_acq_rel
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
index d1047d84e2956..2a7bbad9d6454 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0: ldaxp x10, x12, [x9]
+; -O0: ldaxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
index 1a79c73355143..493bc742f7663 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0: ldaxp x10, x12, [x9]
+; -O0: ldaxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
index 01c44e3f253bb..993d1c1f1b5f0 100644
--- a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
@@ -37,7 +37,7 @@ body: |
bb.1:
%x:gpr32 = COPY $wzr
; Test that the debug location is not copied into bb1!
- ; CHECK: %3:gpr32 = COPY killed %x{{$}}
+ ; CHECK: %3:gpr32 = COPY $wzr
; CHECK-LABEL: bb.2:
bb.2:
%y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0, debug-location !14
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
new file mode 100644
index 0000000000000..883d130bfac4e
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
@@ -0,0 +1,35 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -run-pass=phi-node-elimination -mtriple=aarch64-linux-gnu -o - %s | FileCheck %s
+
+# Verify that the original COPY in bb.1 is reappropriated as the PHI source in bb.2,
+# instead of creating a new COPY with the same source register.
+
+---
+name: test
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: test
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
+ ; CHECK-NEXT: liveins: $nzcv, $wzr
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:gpr32 = IMPLICIT_DEF
+ ; CHECK-NEXT: Bcc 8, %bb.2, implicit $nzcv
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:gpr32 = COPY $wzr
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: %y:gpr32 = COPY [[DEF]]
+ ; CHECK-NEXT: $wzr = COPY %y
+ bb.0:
+ liveins: $nzcv, $wzr
+ Bcc 8, %bb.2, implicit $nzcv
+ bb.1:
+ %x:gpr32 = COPY $wzr
+ bb.2:
+ %y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0
+ $wzr = COPY %y:gpr32
+...
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index fb6575cc0ee83..10fc431b07b18 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -587,8 +587,8 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A,
; CHECK-SD-NEXT: mov w10, w2
; CHECK-SD-NEXT: b.hi .LBB5_4
; CHECK-SD-NEXT: // %bb.2:
-; CHECK-SD-NEXT: mov x11, xzr
; CHECK-SD-NEXT: mov w8, wzr
+; CHECK-SD-NEXT: mov x11, xzr
; CHECK-SD-NEXT: b .LBB5_7
; CHECK-SD-NEXT: .LBB5_3:
; CHECK-SD-NEXT: mov w8, wzr
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
index 37a7782caeed9..cab6fba59cbd1 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
@@ -45,7 +45,7 @@ define i8 @test_rmw_add_8(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_8:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddalb w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -94,7 +94,7 @@ define i16 @test_rmw_add_16(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_16:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddalh w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -143,7 +143,7 @@ define i32 @test_rmw_add_32(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_32:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddal w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -192,7 +192,7 @@ define i64 @test_rmw_add_64(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_64:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: // kill: def $x8 killed $w8
; LSE-NEXT: ldaddal x8, x0, [x0]
; LSE-NEXT: ret
@@ -207,16 +207,16 @@ define i128 @test_rmw_add_128(ptr %dst) {
; NOLSE-NEXT: sub sp, sp, #48
; NOLSE-NEXT: .cfi_def_cfa_offset 48
; NOLSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT: ldr x8, [x0, #8]
-; NOLSE-NEXT: ldr x9, [x0]
+; NOLSE-NEXT: ldr x9, [x0, #8]
+; NOLSE-NEXT: ldr x8, [x0]
; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; NOLSE-NEXT: b .LBB4_1
; NOLSE-NEXT: .LBB4_1: // %atomicrmw.start
; NOLSE-NEXT: // =>This Loop Header: Depth=1
; NOLSE-NEXT: // Child Loop BB4_2 Depth 2
-; NOLSE-NEXT: ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
; NOLSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
; NOLSE-NEXT: adds x14, x11, #1
; NOLSE-NEXT: cinc x15, x13, hs
@@ -246,8 +246,8 @@ define i128 @test_rmw_add_128(ptr %dst) {
; NOLSE-NEXT: str x9, [sp, #16] // 8-byte Folded Spill
; NOLSE-NEXT: subs x12, x12, x13
; NOLSE-NEXT: ccmp x10, x11, #0, eq
-; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: b.ne .LBB4_1
; NOLSE-NEXT: b .LBB4_6
; NOLSE-NEXT: .LBB4_6: // %atomicrmw.end
@@ -261,15 +261,15 @@ define i128 @test_rmw_add_128(ptr %dst) {
; LSE-NEXT: sub sp, sp, #48
; LSE-NEXT: .cfi_def_cfa_offset 48
; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT: ldr x8, [x0, #8]
-; LSE-NEXT: ldr x9, [x0]
+; LSE-NEXT: ldr x9, [x0, #8]
+; LSE-NEXT: ldr x8, [x0]
; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; LSE-NEXT: b .LBB4_1
; LSE-NEXT: .LBB4_1: // %atomicrmw.start
; LSE-NEXT: // =>This Inner Loop Header: Depth=1
-; LSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT: ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x10, [sp, #40] // 8-byte Folded Reload
; LSE-NEXT: ldr x8, [sp, #24] // 8-byte Folded Reload
; LSE-NEXT: mov x0, x10
; LSE-NEXT: mov x1, x11
@@ -284,8 +284,8 @@ define i128 @test_rmw_add_128(ptr %dst) {
; LSE-NEXT: str x8, [sp, #16] // 8-byte Folded Spill
; LSE-NEXT: subs x11, x8, x11
; LSE-NEXT: ccmp x9, x10, #0, eq
-; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: b.ne .LBB4_1
; LSE-NEXT: b .LBB4_2
; LSE-NEXT: .LBB4_2: // %atomicrmw.end
@@ -597,23 +597,23 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; NOLSE-NEXT: sub sp, sp, #48
; NOLSE-NEXT: .cfi_def_cfa_offset 48
; NOLSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT: ldr x8, [x0, #8]
-; NOLSE-NEXT: ldr x9, [x0]
+; NOLSE-NEXT: ldr x9, [x0, #8]
+; NOLSE-NEXT: ldr x8, [x0]
; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; NOLSE-NEXT: b .LBB9_1
; NOLSE-NEXT: .LBB9_1: // %atomicrmw.start
; NOLSE-NEXT: // =>This Loop Header: Depth=1
; NOLSE-NEXT: // Child Loop BB9_2 Depth 2
-; NOLSE-NEXT: ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
; NOLSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
; NOLSE-NEXT: mov w8, w11
; NOLSE-NEXT: mvn w10, w8
; NOLSE-NEXT: // implicit-def: $x8
; NOLSE-NEXT: mov w8, w10
; NOLSE-NEXT: orr x14, x8, #0xfffffffffffffffe
-; NOLSE-NEXT: mov x15, #-1
+; NOLSE-NEXT: mov x15, #-1 // =0xffffffffffffffff
; NOLSE-NEXT: .LBB9_2: // %atomicrmw.start
; NOLSE-NEXT: // Parent Loop BB9_1 Depth=1
; NOLSE-NEXT: // => This Inner Loop Header: Depth=2
@@ -640,8 +640,8 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; NOLSE-NEXT: str x9, [sp, #16] // 8-byte Folded Spill
; NOLSE-NEXT: subs x12, x12, x13
; NOLSE-NEXT: ccmp x10, x11, #0, eq
-; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: b.ne .LBB9_1
; NOLSE-NEXT: b .LBB9_6
; NOLSE-NEXT: .LBB9_6: // %atomicrmw.end
@@ -655,15 +655,15 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: sub sp, sp, #48
; LSE-NEXT: .cfi_def_cfa_offset 48
; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT: ldr x8, [x0, #8]
-; LSE-NEXT: ldr x9, [x0]
+; LSE-NEXT: ldr x9, [x0, #8]
+; LSE-NEXT: ldr x8, [x0]
; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; LSE-NEXT: b .LBB9_1
; LSE-NEXT: .LBB9_1: // %atomicrmw.start
; LSE-NEXT: // =>This Inner Loop Header: Depth=1
-; LSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT: ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x10, [sp, #40] // 8-byte Folded Reload
; LSE-NEXT: ldr x8, [sp, #24] // 8-byte Folded Reload
; LSE-NEXT: mov x0, x10
; LSE-NEXT: mov x1, x11
@@ -672,7 +672,7 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: // implicit-def: $x9
; LSE-NEXT: mov w9, w12
; LSE-NEXT: orr x2, x9, #0xfffffffffffffffe
-; LSE-NEXT: mov x9, #-1
+; LSE-NEXT: mov x9, #-1 // =0xffffffffffffffff
; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3
; LSE-NEXT: mov x3, x9
; LSE-NEXT: caspal x0, x1, x2, x3, [x8]
@@ -682,8 +682,8 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: str x8, [sp, #16] // 8-byte Folded Spill
; LSE-NEXT: subs x11, x8, x11
; LSE-NEXT: ccmp x9, x10, #0, eq
-; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: b.ne .LBB9_1
; LSE-NEXT: b .LBB9_2
; LSE-NEXT: .LBB9_2: // %atomicrmw.end
diff --git a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
index 43d49da1abd21..b0339222bc2df 100644
--- a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
+++ b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
@@ -14,8 +14,8 @@ define i64 @bfi...
[truncated]
|
@llvm/pr-subscribers-backend-loongarch Author: Guy David (guy-david) ChangesThe insertion point of COPY isn't always optimal and could lead to a worse block layout, see the regression test in the first commit (which needs to be reduced). Patch is 2.30 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131837.diff 127 Files Affected:
diff --git a/llvm/lib/CodeGen/PHIElimination.cpp b/llvm/lib/CodeGen/PHIElimination.cpp
index 14f91a87f75b4..cc3d4aac55b9d 100644
--- a/llvm/lib/CodeGen/PHIElimination.cpp
+++ b/llvm/lib/CodeGen/PHIElimination.cpp
@@ -587,6 +587,15 @@ void PHIEliminationImpl::LowerPHINode(MachineBasicBlock &MBB,
MachineBasicBlock::iterator InsertPos =
findPHICopyInsertPoint(&opBlock, &MBB, SrcReg);
+ // Reuse an existing copy in the block if possible.
+ if (MachineInstr *DefMI = MRI->getUniqueVRegDef(SrcReg)) {
+ if (DefMI->isCopy() && DefMI->getParent() == &opBlock &&
+ MRI->use_empty(SrcReg)) {
+ DefMI->getOperand(0).setReg(IncomingReg);
+ continue;
+ }
+ }
+
// Insert the copy.
MachineInstr *NewSrcInstr = nullptr;
if (!reusedIncoming && IncomingReg) {
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
index c1c5c53aa7df2..6c300b04508b2 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
@@ -118,8 +118,8 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
; -O0: bl __aarch64_cas16_relax
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -131,8 +131,8 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
; -O0: bl __aarch64_cas16_relax
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -144,8 +144,8 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
; -O0: bl __aarch64_cas16_rel
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -157,8 +157,8 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
; -O0: bl __aarch64_cas16_acq_rel
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
index d1047d84e2956..2a7bbad9d6454 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0: ldaxp x10, x12, [x9]
+; -O0: ldaxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
index 1a79c73355143..493bc742f7663 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0: ldaxp x10, x12, [x9]
+; -O0: ldaxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
index 01c44e3f253bb..993d1c1f1b5f0 100644
--- a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
@@ -37,7 +37,7 @@ body: |
bb.1:
%x:gpr32 = COPY $wzr
; Test that the debug location is not copied into bb1!
- ; CHECK: %3:gpr32 = COPY killed %x{{$}}
+ ; CHECK: %3:gpr32 = COPY $wzr
; CHECK-LABEL: bb.2:
bb.2:
%y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0, debug-location !14
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
new file mode 100644
index 0000000000000..883d130bfac4e
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
@@ -0,0 +1,35 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -run-pass=phi-node-elimination -mtriple=aarch64-linux-gnu -o - %s | FileCheck %s
+
+# Verify that the original COPY in bb.1 is reappropriated as the PHI source in bb.2,
+# instead of creating a new COPY with the same source register.
+
+---
+name: test
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: test
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
+ ; CHECK-NEXT: liveins: $nzcv, $wzr
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:gpr32 = IMPLICIT_DEF
+ ; CHECK-NEXT: Bcc 8, %bb.2, implicit $nzcv
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:gpr32 = COPY $wzr
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: %y:gpr32 = COPY [[DEF]]
+ ; CHECK-NEXT: $wzr = COPY %y
+ bb.0:
+ liveins: $nzcv, $wzr
+ Bcc 8, %bb.2, implicit $nzcv
+ bb.1:
+ %x:gpr32 = COPY $wzr
+ bb.2:
+ %y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0
+ $wzr = COPY %y:gpr32
+...
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index fb6575cc0ee83..10fc431b07b18 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -587,8 +587,8 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A,
; CHECK-SD-NEXT: mov w10, w2
; CHECK-SD-NEXT: b.hi .LBB5_4
; CHECK-SD-NEXT: // %bb.2:
-; CHECK-SD-NEXT: mov x11, xzr
; CHECK-SD-NEXT: mov w8, wzr
+; CHECK-SD-NEXT: mov x11, xzr
; CHECK-SD-NEXT: b .LBB5_7
; CHECK-SD-NEXT: .LBB5_3:
; CHECK-SD-NEXT: mov w8, wzr
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
index 37a7782caeed9..cab6fba59cbd1 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
@@ -45,7 +45,7 @@ define i8 @test_rmw_add_8(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_8:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddalb w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -94,7 +94,7 @@ define i16 @test_rmw_add_16(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_16:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddalh w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -143,7 +143,7 @@ define i32 @test_rmw_add_32(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_32:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddal w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -192,7 +192,7 @@ define i64 @test_rmw_add_64(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_64:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: // kill: def $x8 killed $w8
; LSE-NEXT: ldaddal x8, x0, [x0]
; LSE-NEXT: ret
@@ -207,16 +207,16 @@ define i128 @test_rmw_add_128(ptr %dst) {
; NOLSE-NEXT: sub sp, sp, #48
; NOLSE-NEXT: .cfi_def_cfa_offset 48
; NOLSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT: ldr x8, [x0, #8]
-; NOLSE-NEXT: ldr x9, [x0]
+; NOLSE-NEXT: ldr x9, [x0, #8]
+; NOLSE-NEXT: ldr x8, [x0]
; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; NOLSE-NEXT: b .LBB4_1
; NOLSE-NEXT: .LBB4_1: // %atomicrmw.start
; NOLSE-NEXT: // =>This Loop Header: Depth=1
; NOLSE-NEXT: // Child Loop BB4_2 Depth 2
-; NOLSE-NEXT: ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
; NOLSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
; NOLSE-NEXT: adds x14, x11, #1
; NOLSE-NEXT: cinc x15, x13, hs
@@ -246,8 +246,8 @@ define i128 @test_rmw_add_128(ptr %dst) {
; NOLSE-NEXT: str x9, [sp, #16] // 8-byte Folded Spill
; NOLSE-NEXT: subs x12, x12, x13
; NOLSE-NEXT: ccmp x10, x11, #0, eq
-; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: b.ne .LBB4_1
; NOLSE-NEXT: b .LBB4_6
; NOLSE-NEXT: .LBB4_6: // %atomicrmw.end
@@ -261,15 +261,15 @@ define i128 @test_rmw_add_128(ptr %dst) {
; LSE-NEXT: sub sp, sp, #48
; LSE-NEXT: .cfi_def_cfa_offset 48
; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT: ldr x8, [x0, #8]
-; LSE-NEXT: ldr x9, [x0]
+; LSE-NEXT: ldr x9, [x0, #8]
+; LSE-NEXT: ldr x8, [x0]
; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; LSE-NEXT: b .LBB4_1
; LSE-NEXT: .LBB4_1: // %atomicrmw.start
; LSE-NEXT: // =>This Inner Loop Header: Depth=1
-; LSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT: ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x10, [sp, #40] // 8-byte Folded Reload
; LSE-NEXT: ldr x8, [sp, #24] // 8-byte Folded Reload
; LSE-NEXT: mov x0, x10
; LSE-NEXT: mov x1, x11
@@ -284,8 +284,8 @@ define i128 @test_rmw_add_128(ptr %dst) {
; LSE-NEXT: str x8, [sp, #16] // 8-byte Folded Spill
; LSE-NEXT: subs x11, x8, x11
; LSE-NEXT: ccmp x9, x10, #0, eq
-; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: b.ne .LBB4_1
; LSE-NEXT: b .LBB4_2
; LSE-NEXT: .LBB4_2: // %atomicrmw.end
@@ -597,23 +597,23 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; NOLSE-NEXT: sub sp, sp, #48
; NOLSE-NEXT: .cfi_def_cfa_offset 48
; NOLSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT: ldr x8, [x0, #8]
-; NOLSE-NEXT: ldr x9, [x0]
+; NOLSE-NEXT: ldr x9, [x0, #8]
+; NOLSE-NEXT: ldr x8, [x0]
; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; NOLSE-NEXT: b .LBB9_1
; NOLSE-NEXT: .LBB9_1: // %atomicrmw.start
; NOLSE-NEXT: // =>This Loop Header: Depth=1
; NOLSE-NEXT: // Child Loop BB9_2 Depth 2
-; NOLSE-NEXT: ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
; NOLSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
; NOLSE-NEXT: mov w8, w11
; NOLSE-NEXT: mvn w10, w8
; NOLSE-NEXT: // implicit-def: $x8
; NOLSE-NEXT: mov w8, w10
; NOLSE-NEXT: orr x14, x8, #0xfffffffffffffffe
-; NOLSE-NEXT: mov x15, #-1
+; NOLSE-NEXT: mov x15, #-1 // =0xffffffffffffffff
; NOLSE-NEXT: .LBB9_2: // %atomicrmw.start
; NOLSE-NEXT: // Parent Loop BB9_1 Depth=1
; NOLSE-NEXT: // => This Inner Loop Header: Depth=2
@@ -640,8 +640,8 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; NOLSE-NEXT: str x9, [sp, #16] // 8-byte Folded Spill
; NOLSE-NEXT: subs x12, x12, x13
; NOLSE-NEXT: ccmp x10, x11, #0, eq
-; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: b.ne .LBB9_1
; NOLSE-NEXT: b .LBB9_6
; NOLSE-NEXT: .LBB9_6: // %atomicrmw.end
@@ -655,15 +655,15 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: sub sp, sp, #48
; LSE-NEXT: .cfi_def_cfa_offset 48
; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT: ldr x8, [x0, #8]
-; LSE-NEXT: ldr x9, [x0]
+; LSE-NEXT: ldr x9, [x0, #8]
+; LSE-NEXT: ldr x8, [x0]
; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; LSE-NEXT: b .LBB9_1
; LSE-NEXT: .LBB9_1: // %atomicrmw.start
; LSE-NEXT: // =>This Inner Loop Header: Depth=1
-; LSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT: ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x10, [sp, #40] // 8-byte Folded Reload
; LSE-NEXT: ldr x8, [sp, #24] // 8-byte Folded Reload
; LSE-NEXT: mov x0, x10
; LSE-NEXT: mov x1, x11
@@ -672,7 +672,7 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: // implicit-def: $x9
; LSE-NEXT: mov w9, w12
; LSE-NEXT: orr x2, x9, #0xfffffffffffffffe
-; LSE-NEXT: mov x9, #-1
+; LSE-NEXT: mov x9, #-1 // =0xffffffffffffffff
; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3
; LSE-NEXT: mov x3, x9
; LSE-NEXT: caspal x0, x1, x2, x3, [x8]
@@ -682,8 +682,8 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: str x8, [sp, #16] // 8-byte Folded Spill
; LSE-NEXT: subs x11, x8, x11
; LSE-NEXT: ccmp x9, x10, #0, eq
-; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: b.ne .LBB9_1
; LSE-NEXT: b .LBB9_2
; LSE-NEXT: .LBB9_2: // %atomicrmw.end
diff --git a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
index 43d49da1abd21..b0339222bc2df 100644
--- a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
+++ b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
@@ -14,8 +14,8 @@ define i64 @bfi...
[truncated]
|
1f7635b
to
3593737
Compare
0ae66b8
to
d87dc5b
Compare
d87dc5b
to
c7d638d
Compare
ping :) |
c7d638d
to
8848f2e
Compare
e549696
to
045edd6
Compare
045edd6
to
5d768f6
Compare
5d768f6
to
8cb3107
Compare
Sorry for the inconvenience. I was not able to reproduce locally, can you test whether #146337 fixes the issue? |
…#131837) The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit. This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.
PR which introduced the bug: llvm#131837. Fixes a crash around dead registers which started in f5c62ee by verifying that the reused incoming register is also virtual.
…#131837) The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit. This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.
PR which introduced the bug: llvm#131837. Fixes a crash around dead registers which started in f5c62ee by verifying that the reused incoming register is also virtual.
…ass, update livevars. (#146337) Follow up to the second bug that #131837 introduced, described in #131837 (comment).
…register class, update livevars. (#146337) Follow up to the second bug that llvm/llvm-project#131837 introduced, described in llvm/llvm-project#131837 (comment).
Unfortunately, it doesn't. I don't see any errors on the file with expensive checks enabled, so I'll need to dig in closer to see what the incorrect code generation caused by this change really is... |
I haven't pinpointed exactly what goes wrong yet, but I've narrowed it down further. The incorrect code generation happens with this much smaller input source, https://martin.st/temp/y4m_parse_tags-preproc.c, compiled with
I see such instructions among the changed instructions in the output (if diffing the output with If someone can have a look at what changes in the code generation pipeline this PR triggers on this kinda small input, which may be causing it, that'd be appreciated! |
@guy-david This patch is causing a hip test case to hang. I've attached a reproducer with good and bad asm. Let me know if you have any questions. |
The link is broken? It links to this PR. |
Sorry about that. Updated original link. Also here: hang-reproducer.tar.gz |
I think this helped me identify the bug in my PR- which didn't take into account that an operand to the PHI can appear more than once in the operand list. |
Thanks! It does seem to fix my issue. (I don’t think I can comment on the fix itself other than that.) BTW, I noticed that the email address on your previous commit, f5c62ee, is a GitHub hidden address - see LLVM Developer Policy and LLVM Discourse - it can be changed for future commits at Keep my email addresses private. |
We also see a miscompile caused by this. We don't have a reproducer yet, but it looks like there are quite a few already. Do we need to revert this? |
Also note that there are unsolved debug-info related problems with the patch as mentioned here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also see a miscompile caused by this. We don't have a reproducer yet, but it looks like there are quite a few already.
Do we need to revert this?
Would probably make sense to go back to a known-good state for now, but please make sure you share the additional reproducer once it is ready
#146806 solves the miscompile issue. Thanks |
@guy-david I'm still seeing the hang with 0629fffe95. |
Reverting because mis-compiles: - llvm/llvm-project#131837 - llvm/llvm-project#146320 - llvm/llvm-project#146337
The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit.
This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.