-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AArch64][SME] Fix accessing the emergency spill slot with hazard padding #142190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ding This patch fixes an issue where when hazard padding was enabled locals, including the emergency spill slot, could not be directly addressed. Generally, this is fine, we can materialize the constant offset in a scratch register, but if there's no register free we need to spill, and if we can't even reach the emergency spill slot then we fail to compile. This patch fixes this by ensuring that if a function has variable-sized objects and is likely to have hazard padding we enable the base pointer. Then if we know a function has hazard padding, place the emergency spill slot next to the BP/SP, to ensure it can be directly accessed without stepping over any hazard padding.
@llvm/pr-subscribers-backend-aarch64 Author: Benjamin Maxwell (MacDue) ChangesThis patch fixes an issue where when hazard padding was enabled locals, including the emergency spill slot, could not be directly addressed. Generally, this is fine, we can materialize the constant offset in a scratch register, but if there's no register free we need to spill, and if we can't even reach the emergency spill slot then we fail to compile. This patch fixes this by ensuring that if a function has variable-sized objects and is likely to have hazard padding we enable the base pointer. Then if we know a function has hazard padding, place the emergency spill slot next to the BP/SP, to ensure it can be directly accessed without stepping over any hazard padding. Patch is 26.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142190.diff 7 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 8caa49de0af43..da3e33429a87a 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -18,6 +18,7 @@
#include "AArch64Subtarget.h"
#include "MCTargetDesc/AArch64AddressingModes.h"
#include "MCTargetDesc/AArch64InstPrinter.h"
+#include "Utils/AArch64SMEAttributes.h"
#include "llvm/ADT/BitVector.h"
#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/CodeGen/LiveRegMatrix.h"
@@ -632,14 +633,26 @@ bool AArch64RegisterInfo::hasBasePointer(const MachineFunction &MF) const {
return true;
auto &ST = MF.getSubtarget<AArch64Subtarget>();
+ const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
if (ST.hasSVE() || ST.isStreaming()) {
- const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
// Frames that have variable sized objects and scalable SVE objects,
// should always use a basepointer.
if (!AFI->hasCalculatedStackSizeSVE() || AFI->getStackSizeSVE())
return true;
}
+ // Frames with hazard padding can have a large offset between the frame
+ // pointer and GPR locals, which includes the emergency spill slot. If the
+ // emergency spill slot is not within range of the load/store instructions
+ // (which have a signed 9-bit range), we will fail to compile if it is used.
+ // Since hasBasePointer() is called before we know if we have hazard padding
+ // or an emergency spill slot we need to enable the basepointer
+ // conservatively.
+ if (AFI->hasStackHazardSlotIndex() ||
+ !SMEAttrs(MF.getFunction()).hasNonStreamingInterfaceAndBody()) {
+ return true;
+ }
+
// Conservatively estimate whether the negative offset from the frame
// pointer will be sufficient to reach. If a function has a smallish
// frame, it's less likely to have lots of spills and callee saved
@@ -764,7 +777,8 @@ AArch64RegisterInfo::useFPForScavengingIndex(const MachineFunction &MF) const {
assert((!MF.getSubtarget<AArch64Subtarget>().hasSVE() ||
AFI->hasCalculatedStackSizeSVE()) &&
"Expected SVE area to be calculated by this point");
- return TFI.hasFP(MF) && !hasStackRealignment(MF) && !AFI->getStackSizeSVE();
+ return TFI.hasFP(MF) && !hasStackRealignment(MF) && !AFI->getStackSizeSVE() &&
+ !AFI->hasStackHazardSlotIndex();
}
bool AArch64RegisterInfo::requiresFrameIndexScavenging(
diff --git a/llvm/test/CodeGen/AArch64/framelayout-scavengingslot-stack-hazard.mir b/llvm/test/CodeGen/AArch64/framelayout-scavengingslot-stack-hazard.mir
new file mode 100644
index 0000000000000..52ac36f801854
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/framelayout-scavengingslot-stack-hazard.mir
@@ -0,0 +1,99 @@
+# RUN: llc -mtriple=aarch64-none-linux-gnu -aarch64-stack-hazard-size=1024 -run-pass=prologepilog %s -o - | FileCheck %s
+--- |
+
+ define void @stack_hazard_streaming_compat() "aarch64_pstate_sm_compatible" { entry: unreachable }
+ define void @stack_hazard_streaming_compat_emergency_spill_slot() "aarch64_pstate_sm_compatible" { entry: unreachable }
+
+...
+
+# +------------------+
+# | GPR callee-saves |
+# +------------------+ <- FP
+# | <hazard padding> |
+# +------------------+
+# | FPR locals |
+# | %stack.1 |
+# +------------------+
+# | <hazard padding> |
+# +------------------+
+# | GPR locals |
+# | %stack.2 |
+# | <emergency spill>|
+# +------------------+ <- BP
+# | <VLA> |
+# +------------------+ <- SP (can't be used due to VLA)
+
+# In this case without the base pointer we'd need the emergency spill slot to
+# access both %stack.1 and %stack.2. With the base pointer we can reach both
+# without spilling.
+
+name: stack_hazard_streaming_compat
+# CHECK-LABEL: name: stack_hazard_streaming_compat
+# CHECK: bb.0:
+# CHECK: STRDui $d0, $x19, 131
+# CHECK-NEXT: STRXui $x0, $x19, 1
+# CHECK: bb.1:
+tracksRegLiveness: true
+frameInfo:
+ isFrameAddressTaken: true
+stack:
+ - { id: 0, type: variable-sized, alignment: 1 }
+ - { id: 1, size: 8, alignment: 8 }
+ - { id: 2, size: 8, alignment: 8 }
+body: |
+ bb.0:
+ liveins: $x0, $x8, $d0
+ $x9 = LDRXui $x0, 0 :: (load (s64))
+ STRDui $d0, %stack.1, 0 :: (store (s64) into %stack.1)
+ STRXui $x0, %stack.2, 0 :: (store (s64) into %stack.2)
+ B %bb.1
+ bb.1:
+ liveins: $x1, $x2, $x3, $x4, $x5, $x6, $x7, $x8, $x9, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x18, $x19, $x20, $x21, $x22, $x23, $x24, $x25, $x26, $x27, $x28, $lr
+ RET_ReallyLR implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26, implicit $x27, implicit $x28, implicit $lr
+...
+---
+# +------------------+
+# | GPR callee-saves |
+# +------------------+ <- FP
+# | <hazard padding> |
+# +------------------+
+# | FPR locals |
+# | %stack.1 |
+# +------------------+
+# | <hazard padding> |
+# +------------------+
+# | GPR locals |
+# | %stack.2 | (very large)
+# | <emergency spill>|
+# +------------------+ <- BP
+# | <VLA> |
+# +------------------+ <- SP (can't be used due to VLA)
+
+# In this case we need to use the emergency spill slot to access %stack.1 as it
+# is too far from the frame pointer and the base pointer to directly address.
+# Note: This also tests that the <emergency spill> located near the SP/BP.
+
+name: stack_hazard_streaming_compat_emergency_spill_slot
+# CHECK-LABEL: name: stack_hazard_streaming_compat_emergency_spill_slot
+# CHECK: bb.0:
+# CHECK: STRXui killed $[[SCRATCH:x[0-9]+]], $x19, 0
+# CHECK-NEXT: $[[SCRATCH]] = ADDXri $x19, 1056, 0
+# CHECK-NEXT: STRDui $d0, killed $[[SCRATCH]], 4095
+# CHECK-NEXT: $[[SCRATCH]] = LDRXui $x19, 0
+# CHECK: bb.1:
+tracksRegLiveness: true
+frameInfo:
+ isFrameAddressTaken: true
+stack:
+ - { id: 0, type: variable-sized, alignment: 1 }
+ - { id: 1, size: 8, alignment: 8 }
+ - { id: 2, size: 32761, alignment: 8 }
+body: |
+ bb.0:
+ liveins: $x0, $x8, $d0
+ $x9 = LDRXui $x0, 0 :: (load (s64))
+ STRDui $d0, %stack.1, 0 :: (store (s64) into %stack.1)
+ B %bb.1
+ bb.1:
+ liveins: $x0, $x1, $x2, $x3, $x4, $x5, $x6, $x7, $x8, $x9, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x18, $x19, $x20, $x21, $x22, $x23, $x24, $x25, $x26, $x27, $x28, $lr
+ RET_ReallyLR implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26, implicit $x27, implicit $x28, implicit $lr
diff --git a/llvm/test/CodeGen/AArch64/sme-agnostic-za.ll b/llvm/test/CodeGen/AArch64/sme-agnostic-za.ll
index 1f68815411097..a38bbe00a40f4 100644
--- a/llvm/test/CodeGen/AArch64/sme-agnostic-za.ll
+++ b/llvm/test/CodeGen/AArch64/sme-agnostic-za.ll
@@ -98,6 +98,7 @@ define i64 @streaming_agnostic_caller_nonstreaming_private_za_callee(i64 %v) nou
; CHECK-NEXT: mov x0, x9
; CHECK-NEXT: add x29, sp, #64
; CHECK-NEXT: stp x20, x19, [sp, #96] // 16-byte Folded Spill
+; CHECK-NEXT: mov x19, sp
; CHECK-NEXT: mov x8, x0
; CHECK-NEXT: bl __arm_sme_state_size
; CHECK-NEXT: sub sp, sp, x0
@@ -145,51 +146,53 @@ define i64 @streaming_compatible_agnostic_caller_nonstreaming_private_za_callee(
; CHECK-NEXT: stp d9, d8, [sp, #48] // 16-byte Folded Spill
; CHECK-NEXT: stp x29, x30, [sp, #64] // 16-byte Folded Spill
; CHECK-NEXT: bl __arm_get_current_vg
-; CHECK-NEXT: str x0, [sp, #80] // 8-byte Folded Spill
+; CHECK-NEXT: stp x0, x21, [sp, #80] // 16-byte Folded Spill
; CHECK-NEXT: mov x0, x9
; CHECK-NEXT: add x29, sp, #64
; CHECK-NEXT: stp x20, x19, [sp, #96] // 16-byte Folded Spill
+; CHECK-NEXT: mov x19, sp
; CHECK-NEXT: mov x8, x0
; CHECK-NEXT: bl __arm_sme_state_size
; CHECK-NEXT: sub sp, sp, x0
-; CHECK-NEXT: mov x19, sp
-; CHECK-NEXT: mov x0, x19
+; CHECK-NEXT: mov x20, sp
+; CHECK-NEXT: mov x0, x20
; CHECK-NEXT: bl __arm_sme_save
; CHECK-NEXT: bl __arm_sme_state
-; CHECK-NEXT: and x20, x0, #0x1
-; CHECK-NEXT: tbz w20, #0, .LBB5_2
+; CHECK-NEXT: and x21, x0, #0x1
+; CHECK-NEXT: tbz w21, #0, .LBB5_2
; CHECK-NEXT: // %bb.1:
; CHECK-NEXT: smstop sm
; CHECK-NEXT: .LBB5_2:
; CHECK-NEXT: mov x0, x8
; CHECK-NEXT: bl private_za_decl
; CHECK-NEXT: mov x2, x0
-; CHECK-NEXT: tbz w20, #0, .LBB5_4
+; CHECK-NEXT: tbz w21, #0, .LBB5_4
; CHECK-NEXT: // %bb.3:
; CHECK-NEXT: smstart sm
; CHECK-NEXT: .LBB5_4:
-; CHECK-NEXT: mov x0, x19
+; CHECK-NEXT: mov x0, x20
; CHECK-NEXT: bl __arm_sme_restore
-; CHECK-NEXT: mov x0, x19
+; CHECK-NEXT: mov x0, x20
; CHECK-NEXT: bl __arm_sme_save
; CHECK-NEXT: bl __arm_sme_state
-; CHECK-NEXT: and x20, x0, #0x1
-; CHECK-NEXT: tbz w20, #0, .LBB5_6
+; CHECK-NEXT: and x21, x0, #0x1
+; CHECK-NEXT: tbz w21, #0, .LBB5_6
; CHECK-NEXT: // %bb.5:
; CHECK-NEXT: smstop sm
; CHECK-NEXT: .LBB5_6:
; CHECK-NEXT: mov x0, x2
; CHECK-NEXT: bl private_za_decl
; CHECK-NEXT: mov x1, x0
-; CHECK-NEXT: tbz w20, #0, .LBB5_8
+; CHECK-NEXT: tbz w21, #0, .LBB5_8
; CHECK-NEXT: // %bb.7:
; CHECK-NEXT: smstart sm
; CHECK-NEXT: .LBB5_8:
-; CHECK-NEXT: mov x0, x19
+; CHECK-NEXT: mov x0, x20
; CHECK-NEXT: bl __arm_sme_restore
; CHECK-NEXT: mov x0, x1
; CHECK-NEXT: sub sp, x29, #64
; CHECK-NEXT: ldp x20, x19, [sp, #96] // 16-byte Folded Reload
+; CHECK-NEXT: ldr x21, [sp, #88] // 8-byte Folded Reload
; CHECK-NEXT: ldp x29, x30, [sp, #64] // 16-byte Folded Reload
; CHECK-NEXT: ldp d9, d8, [sp, #48] // 16-byte Folded Reload
; CHECK-NEXT: ldp d11, d10, [sp, #32] // 16-byte Folded Reload
diff --git a/llvm/test/CodeGen/AArch64/sme-framelower-use-bp.ll b/llvm/test/CodeGen/AArch64/sme-framelower-use-bp.ll
index 99c65b090adb0..026852867fdd6 100644
--- a/llvm/test/CodeGen/AArch64/sme-framelower-use-bp.ll
+++ b/llvm/test/CodeGen/AArch64/sme-framelower-use-bp.ll
@@ -19,6 +19,7 @@ define void @quux() #1 {
; CHECK-NEXT: stp x20, x19, [sp, #80] // 16-byte Folded Spill
; CHECK-NEXT: mov x29, sp
; CHECK-NEXT: sub sp, sp, #384
+; CHECK-NEXT: mov x19, sp
; CHECK-NEXT: .cfi_def_cfa w29, 96
; CHECK-NEXT: .cfi_offset w19, -8
; CHECK-NEXT: .cfi_offset w20, -16
@@ -53,93 +54,78 @@ define void @quux() #1 {
; CHECK-NEXT: // implicit-def: $x9
; CHECK-NEXT: mov w9, w0
; CHECK-NEXT: and x14, x9, #0x70
-; CHECK-NEXT: sub x9, x29, #120
-; CHECK-NEXT: stur x14, [x9, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x14, [x19, #8] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x14
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x10, x29, #112
-; CHECK-NEXT: stur x9, [x10, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #16] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x14
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x10, x29, #104
-; CHECK-NEXT: stur x9, [x10, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #24] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, x8
; CHECK-NEXT: incb x9
; CHECK-NEXT: mov w0, w9
; CHECK-NEXT: // implicit-def: $x9
; CHECK-NEXT: mov w9, w0
; CHECK-NEXT: and x10, x9, #0x3f0
-; CHECK-NEXT: sub x9, x29, #96
-; CHECK-NEXT: stur x10, [x9, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x10, [x19, #32] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x10
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #88
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #40] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x10
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #80
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #48] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x14
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #72
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #56] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x14
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #64
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #64] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x10
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #56
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #72] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x10
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #48
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #80] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x14
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #40
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #88] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x14
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #32
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #96] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x10
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #24
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #104] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x10
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #16
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #112] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x14
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: sub x11, x29, #8
-; CHECK-NEXT: stur x9, [x11, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #120] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x14
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: stur x9, [x29, #-256] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #128] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x10
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: stur x9, [x29, #-248] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #136] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, x10
; CHECK-NEXT: mov sp, x9
-; CHECK-NEXT: stur x9, [x29, #-240] // 8-byte Folded Spill
+; CHECK-NEXT: str x9, [x19, #144] // 8-byte Folded Spill
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, #16
; CHECK-NEXT: mov sp, x9
@@ -174,29 +160,29 @@ define void @quux() #1 {
; CHECK-NEXT: mov x2, sp
; CHECK-NEXT: subs x10, x2, #16
; CHECK-NEXT: mov sp, x10
-; CHECK-NEXT: stur x10, [x29, #-232] // 8-byte Folded Spill
+; CHECK-NEXT: str x10, [x19, #152] // 8-byte Folded Spill
; CHECK-NEXT: mov x10, sp
; CHECK-NEXT: subs x11, x10, x14
; CHECK-NEXT: mov sp, x11
; CHECK-NEXT: mov x10, x11
-; CHECK-NEXT: stur x10, [x29, #-224] // 8-byte Folded Spill
+; CHECK-NEXT: str x10, [x19, #160] // 8-byte Folded Spill
; CHECK-NEXT: mov x0, sp
; CHECK-NEXT: subs x10, x0, #16
; CHECK-NEXT: mov sp, x10
-; CHECK-NEXT: stur x10, [x29, #-216] // 8-byte Folded Spill
+; CHECK-NEXT: str x10, [x19, #168] // 8-byte Folded Spill
; CHECK-NEXT: mov x17, sp
; CHECK-NEXT: subs x10, x17, #16
; CHECK-NEXT: mov sp, x10
-; CHECK-NEXT: stur x10, [x29, #-208] // 8-byte Folded Spill
+; CHECK-NEXT: str x10, [x19, #176] // 8-byte Folded Spill
; CHECK-NEXT: mov x10, sp
; CHECK-NEXT: subs x10, x10, x14
; CHECK-NEXT: stur x10, [x29, #-32] // 8-byte Folded Spill
; CHECK-NEXT: mov sp, x10
-; CHECK-NEXT: stur x10, [x29, #-200] // 8-byte Folded Spill
+; CHECK-NEXT: str x10, [x19, #184] // 8-byte Folded Spill
; CHECK-NEXT: mov x15, sp
; CHECK-NEXT: subs x10, x15, #16
; CHECK-NEXT: mov sp, x10
-; CHECK-NEXT: stur x10, [x29, #-192] // 8-byte Folded Spill
+; CHECK-NEXT: str x10, [x19, #192] // 8-byte Folded Spill
; CHECK-NEXT: mov x13, sp
; CHECK-NEXT: subs x10, x13, #16
; CHECK-NEXT: mov sp, x10
@@ -535,46 +521,33 @@ define void @quux() #1 {
; CHECK-NEXT: b .LBB0_3
; CHECK-NEXT: .LBB0_3: // %bb178
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
-; CHECK-NEXT: ldur x9, [x29, #-232] // 8-byte Folded Reload
-; CHECK-NEXT: sub x8, x29, #80
-; CHECK-NEXT: ldur x8, [x8, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: sub x10, x29, #88
-; CHECK-NEXT: ldur x10, [x10, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: sub x11, x29, #104
-; CHECK-NEXT: ldur x11, [x11, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: sub x12, x29, #112
-; CHECK-NEXT: ldur x12, [x12, #-256] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x9, [x19, #152] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x8, [x19, #48] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x10, [x19, #40] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x11, [x19, #24] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x12, [x19, #16] // 8-byte Folded Reload
; CHECK-NEXT: ldur x13, [x29, #-152] // 8-byte Folded Reload
; CHECK-NEXT: ldur x14, [x29, #-160] // 8-byte Folded Reload
-; CHECK-NEXT: sub x15, x29, #48
-; CHECK-NEXT: ldur x17, [x15, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: sub x15, x29, #56
-; CHECK-NEXT: ldur x18, [x15, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: sub x15, x29, #64
-; CHECK-NEXT: ldur x0, [x15, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: sub x15, x29, #72
-; CHECK-NEXT: ldur x1, [x15, #-256] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x17, [x19, #80] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x18, [x19, #72] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x0, [x19, #64] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x1, [x19, #56] // 8-byte Folded Reload
; CHECK-NEXT: ldur x15, [x29, #-168] // 8-byte Folded Reload
; CHECK-NEXT: ldur x2, [x29, #-176] // 8-byte Folded Reload
-; CHECK-NEXT: sub x16, x29, #16
-; CHECK-NEXT: ldur x3, [x16, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: sub x16, x29, #24
-; CHECK-NEXT: ldur x4, [x16, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: sub x16, x29, #32
-; CHECK-NEXT: ldur x5, [x16, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: sub x16, x29, #40
-; CHECK-NEXT: ldur x6, [x16, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: ldur x16, [x29, #-240] // 8-byte Folded Reload
-; CHECK-NEXT: ldur x7, [x29, #-248] // 8-byte Folded Reload
-; CHECK-NEXT: ldur x20, [x29, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: sub x21, x29, #8
-; CHECK-NEXT: ldur x21, [x21, #-256] // 8-byte Folded Reload
-; CHECK-NEXT: ldur x23, [x29, #-192] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x3, [x19, #112] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x4, [x19, #104] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x5, [x19, #96] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x6, [x19, #88] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x16, [x19, #144] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x7, [x19, #136] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x20, [x19, #128] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x21, [x19, #120] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x23, [x19, #192] // 8-byte Folded Reload
; CHECK-NEXT: ldur x22, [x29, #-184] // 8-byte Folded Reload
-; CHECK-NEXT: ldur x24, [x29, #-200] // 8-byte Folded Reload
-; CHECK-NEXT: ldur x26, [x29, #-216] // 8-byte Folded Reload
-; CHECK-NEXT: ldur x25, [x29, #-208] // 8-byte Folded Reload
-; CHECK-NEXT: ldur x27, [x29, #-224] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x24, [x19, #184] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x26, [x19, #168] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x25, [x19, #176] // 8-byte Folded Reload
+; CHECK-NEXT: ldr x27, [x19, #160] // 8-byte Folded Reload
; CHECK-NEXT: ldr p0, [x27]
; CHECK-NEXT: ldr x27, [x26]
; CHECK-NEXT: mov p8.b, p0.b
diff --git a/ll...
[truncated]
|
Reduced godbolt case for the crash https://godbolt.org/z/eafscqW1c (using handwritten MIR as it's simpler, but the actual crash was found when marking a real benchmark function as streaming-compatible). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of forcing a base pointer, we could relocate the emergency spill slot so it's part of the callee-save region. But I guess if you do that's, it's expensive to refer to local variables, even if it's still technically possible. Not sure. Having two variable-sized regions on the stack makes the stack layout complicated.
// Since hasBasePointer() is called before we know if we have hazard padding | ||
// or an emergency spill slot we need to enable the basepointer | ||
// conservatively. | ||
if (AFI->hasStackHazardSlotIndex() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of the hasStackHazardSlotIndex() check? As the comment notes, we have to be conservative, so if we have a stack hazard, we must have already emitted a base pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a much cheaper check than parsing all the SME attributes and hasBasePointer()
is called again repeatedly after we know we have hazard padding (e.g. when resolving frame indexes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If parsing the SME attributes is expensive, maybe we should store the parsed SME attributes in AFI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd probably be sensible 👍 (the parsed attributes are just a single int
), I can do that as a follow-up NFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR: #142362
Maybe we should (as some of these cases look better), but this was not intentional I missed a check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hmm, that'd probably require a little more rework, and given the base pointer is likely to be beneficial in this case, I don't feel the need to avoid it. |
/cherry-pick b5cf030 |
/pull-request #142741 |
This patch fixes an issue where when hazard padding was enabled locals, including the emergency spill slot, could not be directly addressed.
Generally, this is fine, we can materialize the constant offset in a scratch register, but if there's no register free we need to spill, and if we can't even reach the emergency spill slot then we fail to compile.
This patch fixes this by ensuring that if a function has variable-sized objects and is likely to have hazard padding we enable the base pointer. Then if we know a function has hazard padding, place the emergency spill slot next to the BP/SP, to ensure it can be directly accessed without stepping over any hazard padding.