-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AArch64][SME] Support split ZPR and PPR area allocation #142392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/MacDue/prepare_split
Are you sure you want to change the base?
Conversation
This is a stacked PR. See previous PRs below: |
@llvm/pr-subscribers-backend-aarch64 Author: Benjamin Maxwell (MacDue) ChangesFor a while we have supported the Which looks like:
With this patch the stack (and hazard padding) is rearranged so that hazard padding is placed between the PPRs and ZPRs rather than within the (fixed size) callee-save region. Which looks something like this:
This layout is only enabled if:
Additionally, any FPR callee-saves that are present will be promoted to ZPRs. This is to prevent stack hazards between FPRs and GRPs in the fixed size callee-save area (which would otherwise require more hazard padding, or moving the FPR callee-saves). This layout should resolve the hole in the hazard padding mitigation, and is not intended change codegen for non-SME code. Patch is 172.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142392.diff 8 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index e5592a921e192..36775b8dc05e5 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -270,6 +270,11 @@ static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
cl::desc("sort stack allocations"),
cl::init(true), cl::Hidden);
+static cl::opt<bool>
+ SplitSVEObjects("aarch64-split-sve-objects",
+ cl::desc("Split allocation of ZPR & PPR objects"),
+ cl::init(false), cl::Hidden);
+
cl::opt<bool> EnableHomogeneousPrologEpilog(
"homogeneous-prolog-epilog", cl::Hidden,
cl::desc("Emit homogeneous prologue and epilogue for the size "
@@ -458,10 +463,13 @@ static StackOffset getZPRStackSize(const MachineFunction &MF) {
return StackOffset::getScalable(AFI->getStackSizeZPR());
}
-/// Returns the size of the entire PPR stackframe (calleesaves + spills).
+/// Returns the size of the entire PPR stackframe (calleesaves + spills + hazard
+/// padding).
static StackOffset getPPRStackSize(const MachineFunction &MF) {
const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
- return StackOffset::getScalable(AFI->getStackSizePPR());
+ return StackOffset::get(AFI->hasSplitSVEObjects() ? getStackHazardSize(MF)
+ : 0,
+ AFI->getStackSizePPR());
}
/// Returns the size of the entire SVE stackframe (PPRs + ZPRs).
@@ -484,6 +492,10 @@ bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const {
if (!EnableRedZone)
return false;
+ const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
+ if (AFI->hasSplitSVEObjects())
+ return false;
+
// Don't use the red zone if the function explicitly asks us not to.
// This is typically used for kernel code.
const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
@@ -493,7 +505,6 @@ bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const {
return false;
const MachineFrameInfo &MFI = MF.getFrameInfo();
- const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
uint64_t NumBytes = AFI->getLocalStackSize();
// If neither NEON or SVE are available, a COPY from one Q-reg to
@@ -666,9 +677,11 @@ void AArch64FrameLowering::emitCalleeSavedSVELocations(
const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
AArch64FunctionInfo &AFI = *MF.getInfo<AArch64FunctionInfo>();
CFIInstBuilder CFIBuilder(MBB, MBBI, MachineInstr::FrameSetup);
+ StackOffset PPRStackSize = getPPRStackSize(MF);
for (const auto &Info : CSI) {
- if (!MFI.isScalableStackID(Info.getFrameIdx()))
+ int FI = Info.getFrameIdx();
+ if (!MFI.isScalableStackID(FI))
continue;
// Not all unwinders may know about SVE registers, so assume the lowest
@@ -679,9 +692,13 @@ void AArch64FrameLowering::emitCalleeSavedSVELocations(
continue;
StackOffset Offset =
- StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
+ StackOffset::getScalable(MFI.getObjectOffset(FI)) -
StackOffset::getFixed(AFI.getCalleeSavedStackSize(MFI));
+ if (AFI.hasSplitSVEObjects() &&
+ MFI.getStackID(FI) == TargetStackID::ScalableVector)
+ Offset -= PPRStackSize;
+
CFIBuilder.insertCFIInst(createCFAOffset(TRI, Reg, Offset));
}
}
@@ -2188,33 +2205,68 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
StackOffset LocalsSize =
PPRLocalsSize + ZPRLocalsSize + StackOffset::getFixed(NumBytes);
- StackOffset SVECalleeSavesSize = PPRCalleeSavesSize + ZPRCalleeSavesSize;
- MachineBasicBlock::iterator CalleeSavesBegin =
- AFI->getPPRCalleeSavedStackSize() ? PPRCalleeSavesBegin
- : ZPRCalleeSavesBegin;
- allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
- nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
- MFI.hasVarSizedObjects() || LocalsSize);
- CFAOffset += SVECalleeSavesSize;
-
- MachineBasicBlock::iterator CalleeSavesEnd =
- AFI->getZPRCalleeSavedStackSize() ? ZPRCalleeSavesEnd : PPRCalleeSavesEnd;
- if (EmitAsyncCFI)
- emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
-
- // Allocate space for the rest of the frame including SVE locals. Align the
- // stack as necessary.
- assert(!(canUseRedZone(MF) && NeedsRealignment) &&
- "Cannot use redzone with stack realignment");
- if (!canUseRedZone(MF)) {
- // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
- // the correct value here, as NumBytes also includes padding bytes,
- // which shouldn't be counted here.
- StackOffset SVELocalsSize = PPRLocalsSize + ZPRLocalsSize;
- allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
- SVELocalsSize + StackOffset::getFixed(NumBytes),
- NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
- CFAOffset, MFI.hasVarSizedObjects());
+ if (!AFI->hasSplitSVEObjects()) {
+ StackOffset SVECalleeSavesSize = PPRCalleeSavesSize + ZPRCalleeSavesSize;
+ MachineBasicBlock::iterator CalleeSavesBegin =
+ AFI->getPPRCalleeSavedStackSize() ? PPRCalleeSavesBegin
+ : ZPRCalleeSavesBegin;
+ allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
+ nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
+ MFI.hasVarSizedObjects() || LocalsSize);
+ CFAOffset += SVECalleeSavesSize;
+
+ MachineBasicBlock::iterator CalleeSavesEnd =
+ AFI->getZPRCalleeSavedStackSize() ? ZPRCalleeSavesEnd
+ : PPRCalleeSavesEnd;
+ if (EmitAsyncCFI)
+ emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
+
+ // Allocate space for the rest of the frame including SVE locals. Align the
+ // stack as necessary.
+ assert(!(canUseRedZone(MF) && NeedsRealignment) &&
+ "Cannot use redzone with stack realignment");
+ if (!canUseRedZone(MF)) {
+ // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
+ // the correct value here, as NumBytes also includes padding bytes,
+ // which shouldn't be counted here.
+ StackOffset SVELocalsSize = PPRLocalsSize + ZPRLocalsSize;
+ allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
+ SVELocalsSize + StackOffset::getFixed(NumBytes),
+ NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
+ CFAOffset, MFI.hasVarSizedObjects());
+ }
+ } else {
+ assert(!canUseRedZone(MF) &&
+ "Cannot use redzone with aarch64-split-sve-objects");
+ // TODO: Handle HasWinCFI/NeedsWinCFI?
+ assert(!NeedsWinCFI &&
+ "WinCFI with aarch64-split-sve-objects is not supported");
+
+ // Split ZPR and PPR allocation.
+ // Allocate PPR callee saves
+ allocateStackSpace(MBB, PPRCalleeSavesBegin, 0, PPRCalleeSavesSize, false,
+ nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
+ MFI.hasVarSizedObjects() || ZPRCalleeSavesSize ||
+ ZPRLocalsSize || PPRLocalsSize);
+ CFAOffset += PPRCalleeSavesSize;
+
+ // Allocate PPR locals + ZPR callee saves
+ assert(PPRCalleeSavesEnd == ZPRCalleeSavesBegin &&
+ "Expected ZPR callee saves after PPR locals");
+ allocateStackSpace(MBB, PPRCalleeSavesEnd, RealignmentPadding,
+ PPRLocalsSize + ZPRCalleeSavesSize, false, nullptr,
+ EmitAsyncCFI && !HasFP, CFAOffset,
+ MFI.hasVarSizedObjects() || ZPRLocalsSize);
+ CFAOffset += PPRLocalsSize + ZPRCalleeSavesSize;
+
+ // Allocate ZPR locals
+ allocateStackSpace(MBB, ZPRCalleeSavesEnd, RealignmentPadding,
+ ZPRLocalsSize + StackOffset::getFixed(NumBytes), false,
+ nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
+ MFI.hasVarSizedObjects());
+
+ if (EmitAsyncCFI)
+ emitCalleeSavedSVELocations(MBB, ZPRCalleeSavesEnd);
}
// If we need a base pointer, set it up here. It's whatever the value of the
@@ -2456,7 +2508,9 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
}
}
- StackOffset SVEStackSize = getSVEStackSize(MF);
+ StackOffset ZPRStackSize = getZPRStackSize(MF);
+ StackOffset PPRStackSize = getPPRStackSize(MF);
+ StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
// If there is a single SP update, insert it before the ret and we're done.
if (CombineSPBump) {
@@ -2477,69 +2531,141 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
NumBytes -= PrologueSaveSize;
assert(NumBytes >= 0 && "Negative stack allocation size!?");
- // Process the SVE callee-saves to determine what space needs to be
- // deallocated.
- StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
- MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
- int64_t ZPRCalleeSavedSize = AFI->getZPRCalleeSavedStackSize();
- int64_t PPRCalleeSavedSize = AFI->getPPRCalleeSavedStackSize();
- int64_t SVECalleeSavedSize = ZPRCalleeSavedSize + PPRCalleeSavedSize;
-
- if (SVECalleeSavedSize) {
- RestoreBegin = std::prev(RestoreEnd);
- while (RestoreBegin != MBB.begin() &&
- IsSVECalleeSave(std::prev(RestoreBegin)))
- --RestoreBegin;
-
- assert(IsSVECalleeSave(RestoreBegin) &&
- IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
-
- StackOffset CalleeSavedSizeAsOffset =
- StackOffset::getScalable(SVECalleeSavedSize);
- DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
- DeallocateAfter = CalleeSavedSizeAsOffset;
- }
-
- // Deallocate the SVE area.
- if (SVEStackSize) {
- // If we have stack realignment or variable sized objects on the stack,
- // restore the stack pointer from the frame pointer prior to SVE CSR
- // restoration.
- if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
- if (SVECalleeSavedSize) {
- // Set SP to start of SVE callee-save area from which they can
- // be reloaded. The code below will deallocate the stack space
- // space by moving FP -> SP.
- emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
- StackOffset::getScalable(-SVECalleeSavedSize), TII,
- MachineInstr::FrameDestroy);
- }
- } else {
- if (SVECalleeSavedSize) {
- // Deallocate the non-SVE locals first before we can deallocate (and
- // restore callee saves) from the SVE area.
- emitFrameOffset(
- MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
- StackOffset::getFixed(NumBytes), TII, MachineInstr::FrameDestroy,
- false, false, nullptr, EmitCFI && !hasFP(MF),
- SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
- NumBytes = 0;
+ if (!AFI->hasSplitSVEObjects()) {
+ // Process the SVE callee-saves to determine what space needs to be
+ // deallocated.
+ StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
+ MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
+ int64_t ZPRCalleeSavedSize = AFI->getZPRCalleeSavedStackSize();
+ int64_t PPRCalleeSavedSize = AFI->getPPRCalleeSavedStackSize();
+ int64_t SVECalleeSavedSize = ZPRCalleeSavedSize + PPRCalleeSavedSize;
+
+ if (SVECalleeSavedSize) {
+ RestoreBegin = std::prev(RestoreEnd);
+ while (RestoreBegin != MBB.begin() &&
+ IsSVECalleeSave(std::prev(RestoreBegin)))
+ --RestoreBegin;
+
+ assert(IsSVECalleeSave(RestoreBegin) &&
+ IsSVECalleeSave(std::prev(RestoreEnd)) &&
+ "Unexpected instruction");
+
+ StackOffset CalleeSavedSizeAsOffset =
+ StackOffset::getScalable(SVECalleeSavedSize);
+ DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
+ DeallocateAfter = CalleeSavedSizeAsOffset;
+ }
+
+ // Deallocate the SVE area.
+ if (SVEStackSize) {
+ // If we have stack realignment or variable sized objects on the stack,
+ // restore the stack pointer from the frame pointer prior to SVE CSR
+ // restoration.
+ if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
+ if (SVECalleeSavedSize) {
+ // Set SP to start of SVE callee-save area from which they can
+ // be reloaded. The code below will deallocate the stack space
+ // space by moving FP -> SP.
+ emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
+ StackOffset::getScalable(-SVECalleeSavedSize), TII,
+ MachineInstr::FrameDestroy);
+ }
+ } else {
+ if (SVECalleeSavedSize) {
+ // Deallocate the non-SVE locals first before we can deallocate (and
+ // restore callee saves) from the SVE area.
+ emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
+ StackOffset::getFixed(NumBytes), TII,
+ MachineInstr::FrameDestroy, false, false, nullptr,
+ EmitCFI && !hasFP(MF),
+ SVEStackSize + StackOffset::getFixed(
+ NumBytes + PrologueSaveSize));
+ NumBytes = 0;
+ }
+
+ emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
+ DeallocateBefore, TII, MachineInstr::FrameDestroy,
+ false, false, nullptr, EmitCFI && !hasFP(MF),
+ SVEStackSize +
+ StackOffset::getFixed(NumBytes + PrologueSaveSize));
+
+ emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
+ DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
+ false, nullptr, EmitCFI && !hasFP(MF),
+ DeallocateAfter +
+ StackOffset::getFixed(NumBytes + PrologueSaveSize));
}
+ if (EmitCFI)
+ emitCalleeSavedSVERestores(MBB, RestoreEnd);
+ }
+ } else if (SVEStackSize) {
+ assert(!AFI->isStackRealigned() && !MFI.hasVarSizedObjects() &&
+ "TODO: Support stack realigment / variable-sized objects");
+ // SplitSVEObjects. Determine the sizes and starts/ends of the ZPR and PPR
+ // areas.
+ auto ZPRCalleeSavedSize =
+ StackOffset::getScalable(AFI->getZPRCalleeSavedStackSize());
+ auto PPRCalleeSavedSize =
+ StackOffset::getScalable(AFI->getPPRCalleeSavedStackSize());
+ StackOffset PPRLocalsSize = PPRStackSize - PPRCalleeSavedSize;
+ StackOffset ZPRLocalsSize = ZPRStackSize - ZPRCalleeSavedSize;
+
+ MachineBasicBlock::iterator PPRRestoreBegin = LastPopI,
+ PPRRestoreEnd = LastPopI;
+ if (PPRCalleeSavedSize) {
+ PPRRestoreBegin = std::prev(PPRRestoreEnd);
+ while (PPRRestoreBegin != MBB.begin() &&
+ IsPPRCalleeSave(std::prev(PPRRestoreBegin)))
+ --PPRRestoreBegin;
+ }
+
+ MachineBasicBlock::iterator ZPRRestoreBegin = PPRRestoreBegin,
+ ZPRRestoreEnd = PPRRestoreBegin;
+ if (ZPRCalleeSavedSize) {
+ ZPRRestoreBegin = std::prev(ZPRRestoreEnd);
+ while (ZPRRestoreBegin != MBB.begin() &&
+ IsZPRCalleeSave(std::prev(ZPRRestoreBegin)))
+ --ZPRRestoreBegin;
+ }
+
+ auto CFAOffset =
+ SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize);
+ if (PPRCalleeSavedSize || ZPRCalleeSavedSize) {
+ // Deallocate the non-SVE locals first before we can deallocate (and
+ // restore callee saves) from the SVE area.
+ auto NonSVELocals = StackOffset::getFixed(NumBytes);
+ emitFrameOffset(MBB, ZPRRestoreBegin, DL, AArch64::SP, AArch64::SP,
+ NonSVELocals, TII, MachineInstr::FrameDestroy, false,
+ false, nullptr, EmitCFI && !hasFP(MF), CFAOffset);
+ NumBytes = 0;
+ CFAOffset -= NonSVELocals;
+ }
+
+ if (ZPRLocalsSize) {
+ emitFrameOffset(MBB, ZPRRestoreBegin, DL, AArch64::SP, AArch64::SP,
+ ZPRLocalsSize, TII, MachineInstr::FrameDestroy, false,
+ false, nullptr, EmitCFI && !hasFP(MF), CFAOffset);
+ CFAOffset -= ZPRLocalsSize;
+ }
- emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
- DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
- false, nullptr, EmitCFI && !hasFP(MF),
- SVEStackSize +
- StackOffset::getFixed(NumBytes + PrologueSaveSize));
-
- emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
- DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
- false, nullptr, EmitCFI && !hasFP(MF),
- DeallocateAfter +
- StackOffset::getFixed(NumBytes + PrologueSaveSize));
+ if (PPRLocalsSize || ZPRCalleeSavedSize) {
+ assert(PPRRestoreBegin == ZPRRestoreEnd &&
+ "Expected PPR restores after ZPR");
+ emitFrameOffset(MBB, PPRRestoreBegin, DL, AArch64::SP, AArch64::SP,
+ PPRLocalsSize + ZPRCalleeSavedSize, TII,
+ MachineInstr::FrameDestroy, false, false, nullptr,
+ EmitCFI && !hasFP(MF), CFAOffset);
+ CFAOffset -= PPRLocalsSize + ZPRCalleeSavedSize;
}
+ if (PPRCalleeSavedSize) {
+ emitFrameOffset(MBB, PPRRestoreEnd, DL, AArch64::SP, AArch64::SP,
+ PPRCalleeSavedSize, TII, MachineInstr::FrameDestroy,
+ false, false, nullptr, EmitCFI && !hasFP(MF), CFAOffset);
+ }
+
+ // We only emit CFI information for ZPRs so emit CFI after the ZPR restores.
if (EmitCFI)
- emitCalleeSavedSVERestores(MBB, RestoreEnd);
+ emitCalleeSavedSVERestores(MBB, ZPRRestoreEnd);
}
if (!hasFP(MF)) {
@@ -2660,9 +2786,15 @@ AArch64FrameLowering::getFrameIndexReferenceFromSP(const MachineFunction &MF,
return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
- if (MFI.isScalableStackID(FI))
- return StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
+ if (MFI.isScalableStackID(FI)) {
+ StackOffset AccessOffset{};
+ if (AFI->hasSplitSVEObjects() &&
+ MFI.getStackID(FI) == TargetStackID::ScalableVector)
+ AccessOffset = -PPRStackSize;
+ return AccessOffset +
+ StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
ObjectOffset);
+ }
bool IsFixed = MFI.isFixedObjectIndex(FI);
bool IsCSR =
@@ -2720,12 +2852,12 @@ StackOffset AArch64FrameLowering::resolveFrameIndexReference(
bool isFixed = MFI.isFixedObjectIndex(FI);
bool isSVE = MFI.isScalableStackID(FI);
return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
- PreferFP, ForSimm);
+ PreferFP, ForSimm, FI);
}
StackOffset AArch64FrameLowering::resolveFrameOffsetReference(
const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
- Register &FrameReg, bool PreferFP, bool ForSimm) const {
+ Register &FrameReg, bool PreferFP, bool ForSimm, int64_t FI) const {
const auto &MFI = MF.getFrameInfo();
const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
MF.getSubtarget().getRegisterInfo());
@@ -2737,7 +2869,9 @@ StackOffset AArch64FrameLowering::resolveFrameOffsetReference(
bool isCSR =
!isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
- const StackOffset SVEStackSize = getSVEStackSize(MF);
+ StackOffset ZPRStackSize = getZPRStackSize(MF);
+ StackOffset PPRStackSize = getPPRStackSize(MF);
+ StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
// Use frame pointer to reference fixed object...
[truncated]
|
In the implementation you're interested in, is there not a hazard between PPRs and GPRs? What's the interaction between this and aarch64-enable-zpr-predicate-spills? |
There are no hazards between PPRs and GPRs (those types of memory accesses can both be considered as occurring on the CPU).
It cannot be enabled at the same time. We intend to remove |
For a while we have supported the `-aarch64-stack-hazard-size=<size>` option, which adds "hazard padding" between GPRs and FPR/ZPRs. However, there is currently a hole in this mitigation as PPR and FPR/ZPR accesses to the same area also cause streaming memory hazards (this is noted by `-pass-remarks-analysis=sme -aarch64-stack-hazard-remark-size=<val>`), and the current stack layout places PPRs and ZPRs within the same area. Which looks like: ------------------------------------ Higher address | callee-saved gpr registers | |---------------------------------- | | lr,fp (a.k.a. "frame record") | |-----------------------------------| <- fp(=x29) | <hazard padding> | |-----------------------------------| | callee-saved fp/simd/SVE regs | |-----------------------------------| | SVE stack objects | |-----------------------------------| | local variables of fixed size | | <FPR> | | <hazard padding> | | <GPR> | ------------------------------------| <- sp | Lower address With this patch the stack (and hazard padding) is rearranged so that hazard padding is placed between the PPRs and ZPRs rather than within the (fixed size) callee-save region. Which looks something like this: ------------------------------------ Higher address | callee-saved gpr registers | |---------------------------------- | | lr,fp (a.k.a. "frame record") | |-----------------------------------| <- fp(=x29) | callee-saved PPRs | | PPR stack objects | (These are SVE predicates) |-----------------------------------| | <hazard padding> | |-----------------------------------| | callee-saved ZPR regs | (These are SVE vectors) | ZPR stack objects | Note: FPRs are promoted to ZPRs |-----------------------------------| | local variables of fixed size | | <FPR> | | <hazard padding> | | <GPR> | ------------------------------------| <- sp | Lower address This layout is only enabled if: * SplitSVEObjects are enabled (`-aarch64-split-sve-objects`) - (This may be enabled by default in a later patch) * Streaming memory hazards are present - (`-aarch64-stack-hazard-size=<val>` != 0) * PPRs and FPRs/ZPRs are on the stack * There's no stack realignment or variable-sized objects - This is left as a TODO for now Additionally, any FPR callee-saves that are present will be promoted to ZPRs. This is to prevent stack hazards between FPRs and GRPs in the fixed size callee-save area (which would otherwise require more hazard padding, or moving the FPR callee-saves). This layout should resolve the hole in the hazard padding mitigation, and is not intended change codegen for non-SME code.
0b09392
to
42af819
Compare
9052fbd
to
bd86d4a
Compare
Rebased this PR stack on the changes from #138609... Which makes things even hairier 😅 It would be nice if all these modes were not so intertwined in the code. I may look into if there's someway to move some of the logic into some kind of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you express the size of the hazard padding between the PPRs and ZPRs as a scalable size, that might simplify some of the logic? You wouldn't need to represent the two areas as separate stacks, at least.
Maybe it's also worth considering "preallocating" some stack slots, along the lines of LocalStackSlotAllocation; anything we can allocate in an earlier pass doesn't have to be part of the core stack layout, and we probably get better code if we have a "base" for ZPR variables.
Do we care at all how stack protectors fit into all of this? I suspect the stack protector ends up in an unfortunate position by default.
HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR; | ||
// For SplitSVEObjects remember that this stack slot is a predicate, this | ||
// will be needed later when determining the frame layout. | ||
if (SlotTypes[FI] == SlotType::PPR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look quite right. Anything with scalable size needs to be either ScalablePredVector or ScalableVector. Anything with non-scalable size needs to be on the normal stack, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which bit does not look right? SlotType::PPR
(not to be confused with SlotType::GPR
) is only set if the original stack ID was scalable and all accesses to that slot used predicate load/stores. The original stack ID could be ScalableVector
, as the earlier selection of the stack ID is only based on the type size. Therefore, we change it to ScalablePredVector
here so that it can be sorted into the correct region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I got confused by this bit:
if (IsScalable || AArch64InstrInfo::isFpOrNEON(MI)) {
SlotTypes[*FI] |= IsPPR ? SlotType::PPR : SlotType::ZPRorFPR;
} else {
SlotTypes[*FI] |= SlotType::GPR;
}
Maybe rewrite it as the following, to split the cases a bit more cleanly?
if (IsScalable) {
SlotTypes[*FI] |= isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
} else {
SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI) ? SlotType::ZPRorFPR : SlotType::GPR;
}
Also, I'm not sure isFpOrNEON is quite sufficient here; we probably also want to handle SVE accesses to fixed-width slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rewrite it as the following, to split the cases a bit more cleanly?
👍 I've tidied up this case.
Also, I'm not sure isFpOrNEON is quite sufficient here; we probably also want to handle SVE accesses to fixed-width slots.
Hmmm, I'll look into this, these checks are mostly the same as the for the current hazard padding option.
It would, but for the sizes of hazard padding and vscale we're interested in, it would result in a much larger allocation than necessary and likely complicate addressing predicates and vectors more so (due to limited ranges for scalable offsets).
I'll take a look at
I don't think it's that important. Given the stack protector can end up in the scalable area it will result in a hazard if enabled (with this change or the previous hazard padding patch), but I don't think it's something likely to be enabled for library kernels. |
Change-Id: I30e2cf5ea7a1df932f145e685a3fbd39cd974d4d
For a while we have supported the
-aarch64-stack-hazard-size=<size>
option, which adds "hazard padding" between GPRs and FPR/ZPRs. However, there is currently a hole in this mitigation as PPR and FPR/ZPR accesses to the same area also cause streaming memory hazards (this is noted by-pass-remarks-analysis=sme -aarch64-stack-hazard-remark-size=<val>
), and the current stack layout places PPRs and ZPRs within the same area.Which looks like:
With this patch the stack (and hazard padding) is rearranged so that hazard padding is placed between the PPRs and ZPRs rather than within the (fixed size) callee-save region. Which looks something like this:
This layout is only enabled if:
-aarch64-split-sve-objects
)-aarch64-stack-hazard-size=<val>
!= 0)Additionally, any FPR callee-saves that are present will be promoted to ZPRs. This is to prevent stack hazards between FPRs and GRPs in the fixed size callee-save area (which would otherwise require more hazard padding, or moving the FPR callee-saves).
This layout should resolve the hole in the hazard padding mitigation, and is not intended change codegen for non-SME code.