Skip to content

[AArch64] Initial compiler support for SVE unwind on Windows. #138609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 3, 2025

Conversation

efriedma-quic
Copy link
Collaborator

Most bits of this are straightforward: when we emit SVE instructions in the prologue/epilogue, emit corresponding opcodes.

The unfortunately nasty bit is the handling of the frame pointer in functions that use the SVE calling convention. If we have SVE callee saves, and need to restore the stack pointer from the frame pointer, it's impossible to encode callee saves that happen after the frame pointer. So this patch rearranges the stack to put SVE callee saves first. This isn't really that complicated on its own, but it leads to a lot of tricky conditionals (see FPAfterSVECalleeSaves).

@llvmbot
Copy link
Member

llvmbot commented May 5, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Eli Friedman (efriedma-quic)

Changes

Most bits of this are straightforward: when we emit SVE instructions in the prologue/epilogue, emit corresponding opcodes.

The unfortunately nasty bit is the handling of the frame pointer in functions that use the SVE calling convention. If we have SVE callee saves, and need to restore the stack pointer from the frame pointer, it's impossible to encode callee saves that happen after the frame pointer. So this patch rearranges the stack to put SVE callee saves first. This isn't really that complicated on its own, but it leads to a lot of tricky conditionals (see FPAfterSVECalleeSaves).


Patch is 68.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138609.diff

7 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp (+26)
  • (modified) llvm/lib/Target/AArch64/AArch64CallingConvention.td (+8)
  • (modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (+134-19)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+21-11)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+3)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+7)
  • (added) llvm/test/CodeGen/AArch64/win-sve.ll (+1034)
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index 870df4c387ca4..47bc44dab9d90 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -3294,6 +3294,32 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
                                      -MI->getOperand(2).getImm());
     return;
 
+  case AArch64::SEH_AllocZ:
+    assert(MI->getOperand(0).getImm() >= 0 &&
+           "AllocZ SEH opcode offset must be non-negative");
+    assert(MI->getOperand(0).getImm() <= 255 &&
+           "AllocZ SEH opcode offset must fit into 8 bits");
+    TS->emitARM64WinCFIAllocZ(MI->getOperand(0).getImm());
+    return;
+
+  case AArch64::SEH_SaveZReg:
+    assert(MI->getOperand(1).getImm() >= 0 &&
+           "SaveZReg SEH opcode offset must be non-negative");
+    assert(MI->getOperand(1).getImm() <= 255 &&
+           "SaveZReg SEH opcode offset must fit into 8 bits");
+    TS->emitARM64WinCFISaveZReg(MI->getOperand(0).getImm(),
+                                MI->getOperand(1).getImm());
+    return;
+
+  case AArch64::SEH_SavePReg:
+    assert(MI->getOperand(1).getImm() >= 0 &&
+           "SavePReg SEH opcode offset must be non-negative");
+    assert(MI->getOperand(1).getImm() <= 255 &&
+           "SavePReg SEH opcode offset must fit into 8 bits");
+    TS->emitARM64WinCFISavePReg(MI->getOperand(0).getImm(),
+                                MI->getOperand(1).getImm());
+    return;
+
   case AArch64::BLR:
   case AArch64::BR: {
     recordIfImportCall(MI);
diff --git a/llvm/lib/Target/AArch64/AArch64CallingConvention.td b/llvm/lib/Target/AArch64/AArch64CallingConvention.td
index 7cca6d9bc6b9c..287bbbce95bd9 100644
--- a/llvm/lib/Target/AArch64/AArch64CallingConvention.td
+++ b/llvm/lib/Target/AArch64/AArch64CallingConvention.td
@@ -606,6 +606,9 @@ def CSR_Win_AArch64_Arm64EC_Thunk : CalleeSavedRegs<(add (sequence "Q%u", 6, 15)
 def CSR_AArch64_AAVPCS : CalleeSavedRegs<(add X19, X20, X21, X22, X23, X24,
                                           X25, X26, X27, X28, LR, FP,
                                           (sequence "Q%u", 8, 23))>;
+def CSR_Win_AArch64_AAVPCS : CalleeSavedRegs<(add X19, X20, X21, X22, X23, X24,
+                                              X25, X26, X27, X28, FP, LR,
+                                              (sequence "Q%u", 8, 23))>;
 
 // Functions taking SVE arguments or returning an SVE type
 // must (additionally) preserve full Z8-Z23 and predicate registers P4-P15
@@ -619,6 +622,11 @@ def CSR_Darwin_AArch64_SVE_AAPCS : CalleeSavedRegs<(add (sequence "Z%u", 8, 23),
                                                         LR, FP, X19, X20, X21, X22,
                                                         X23, X24, X25, X26, X27, X28)>;
 
+def CSR_Win_AArch64_SVE_AAPCS : CalleeSavedRegs<(add (sequence "P%u", 4, 11),
+                                                     (sequence "Z%u", 8, 23),
+                                                     X19, X20, X21, X22, X23, X24,
+                                                     X25, X26, X27, X28, FP, LR)>;
+
 // SME ABI support routines such as __arm_tpidr2_save/restore preserve most registers.
 def CSR_AArch64_SME_ABI_Support_Routines_PreserveMost_From_X0
                           : CalleeSavedRegs<(add (sequence "Z%u", 0, 31),
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 78ac57e3e92a6..6b7e494b2c59b 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -1200,7 +1200,25 @@ static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI,
 
   switch (Opc) {
   default:
-    llvm_unreachable("No SEH Opcode for this instruction");
+    report_fatal_error("No SEH Opcode for this instruction");
+  case AArch64::STR_ZXI:
+  case AArch64::LDR_ZXI: {
+    unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
+    MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
+              .addImm(Reg0)
+              .addImm(Imm)
+              .setMIFlag(Flag);
+    break;
+  }
+  case AArch64::STR_PXI:
+  case AArch64::LDR_PXI: {
+    unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
+    MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
+              .addImm(Reg0)
+              .addImm(Imm)
+              .setMIFlag(Flag);
+    break;
+  }
   case AArch64::LDPDpost:
     Imm = -Imm;
     [[fallthrough]];
@@ -1592,6 +1610,9 @@ static bool IsSVECalleeSave(MachineBasicBlock::iterator I) {
   case AArch64::CMPNE_PPzZI_B:
     return I->getFlag(MachineInstr::FrameSetup) ||
            I->getFlag(MachineInstr::FrameDestroy);
+  case AArch64::SEH_SavePReg:
+  case AArch64::SEH_SaveZReg:
+    return true;
   }
 }
 
@@ -1874,12 +1895,48 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
   bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
   unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
 
+  // Windows unwind can't represent the required stack adjustments if we have
+  // both SVE callee-saves and dynamic stack allocations, and the frame
+  // pointer is before the SVE spills.  The allocation of the frame pointer
+  // must be the last instruction in the prologue so the unwinder can restore
+  // the stack pointer correctly. (And there isn't any unwind opcode for
+  // `addvl sp, x29, -17`.)
+  //
+  // Because of this, we do spills in the opposite order on Windows: first SVE,
+  // then GPRs. The main side-effect of this is that it makes accessing
+  // parameters passed on the stack more expensive.
+  //
+  // We could consider rearranging the spills for simpler cases.
+  bool FPAfterSVECalleeSaves =
+      Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
+
   auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
   // All of the remaining stack allocations are for locals.
   AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
   bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
   bool HomPrologEpilog = homogeneousPrologEpilog(MF);
-  if (CombineSPBump) {
+  if (FPAfterSVECalleeSaves) {
+    // If we're doing SVE saves first, we need to immediately allocate space
+    // for fixed objects, then space for the SVE callee saves.
+    //
+    // Windows unwind requires that the scalable size is a multiple of 16;
+    // that's handled when the callee-saved size is computed.
+    auto SaveSize =
+        StackOffset::getScalable(AFI->getSVECalleeSavedStackSize()) +
+        StackOffset::getFixed(FixedObject);
+    allocateStackSpace(MBB, MBBI, 0, SaveSize, NeedsWinCFI, &HasWinCFI,
+                       /*EmitCFI=*/false, StackOffset{},
+                       /*FollowupAllocs=*/true);
+    NumBytes -= FixedObject;
+
+    // Now allocate space for the GPR callee saves.
+    while (MBBI != End && IsSVECalleeSave(MBBI))
+      ++MBBI;
+    MBBI = convertCalleeSaveRestoreToSPPrePostIncDec(
+        MBB, MBBI, DL, TII, -AFI->getCalleeSavedStackSize(), NeedsWinCFI,
+        &HasWinCFI, EmitAsyncCFI);
+    NumBytes -= AFI->getCalleeSavedStackSize();
+  } else if (CombineSPBump) {
     assert(!SVEStackSize && "Cannot combine SP bump with SVE");
     emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
                     StackOffset::getFixed(-NumBytes), TII,
@@ -1982,6 +2039,8 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
           : 0;
 
   if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
+    if (AFI->getSVECalleeSavedStackSize())
+      report_fatal_error("SVE callee saves not yet supported");
     uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
     if (NeedsWinCFI) {
       HasWinCFI = true;
@@ -2116,9 +2175,11 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
                       << "\n");
     // Find callee save instructions in frame.
     CalleeSavesBegin = MBBI;
-    assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
-    while (IsSVECalleeSave(MBBI) && MBBI != MBB.getFirstTerminator())
-      ++MBBI;
+    if (!FPAfterSVECalleeSaves) {
+      assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
+      while (IsSVECalleeSave(MBBI) && MBBI != MBB.getFirstTerminator())
+        ++MBBI;
+    }
     CalleeSavesEnd = MBBI;
 
     SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
@@ -2129,9 +2190,11 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
   StackOffset CFAOffset =
       StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
   StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
-  allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
-                     nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
-                     MFI.hasVarSizedObjects() || LocalsSize);
+  if (!FPAfterSVECalleeSaves) {
+    allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
+                       nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
+                       MFI.hasVarSizedObjects() || LocalsSize);
+  }
   CFAOffset += SVECalleeSavesSize;
 
   if (EmitAsyncCFI)
@@ -2303,10 +2366,16 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
     assert(AfterCSRPopSize == 0);
     return;
   }
+
+  bool FPAfterSVECalleeSaves =
+      Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
+
   bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
   // Assume we can't combine the last pop with the sp restore.
   bool CombineAfterCSRBump = false;
-  if (!CombineSPBump && PrologueSaveSize != 0) {
+  if (FPAfterSVECalleeSaves) {
+    AfterCSRPopSize = FixedObject;
+  } else if (!CombineSPBump && PrologueSaveSize != 0) {
     MachineBasicBlock::iterator Pop = std::prev(MBB.getFirstTerminator());
     while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
            AArch64InstrInfo::isSEHInstruction(*Pop))
@@ -2339,7 +2408,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
   while (LastPopI != Begin) {
     --LastPopI;
     if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
-        IsSVECalleeSave(LastPopI)) {
+        (!FPAfterSVECalleeSaves && IsSVECalleeSave(LastPopI))) {
       ++LastPopI;
       break;
     } else if (CombineSPBump)
@@ -2415,6 +2484,9 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
   StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
   MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
   if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
+    if (FPAfterSVECalleeSaves)
+      RestoreEnd = MBB.getFirstTerminator();
+
     RestoreBegin = std::prev(RestoreEnd);
     while (RestoreBegin != MBB.begin() &&
            IsSVECalleeSave(std::prev(RestoreBegin)))
@@ -2430,7 +2502,31 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
   }
 
   // Deallocate the SVE area.
-  if (SVEStackSize) {
+  if (FPAfterSVECalleeSaves) {
+    // If the callee-save area is before FP, restoring the FP implicitly
+    // deallocates non-callee-save SVE allocations.  Otherwise, deallocate
+    // them explicitly.
+    if (!AFI->isStackRealigned() && !MFI.hasVarSizedObjects()) {
+      emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
+                      DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
+                      NeedsWinCFI, &HasWinCFI);
+    }
+
+    // Deallocate callee-save non-SVE registers.
+    emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
+                    StackOffset::getFixed(AFI->getCalleeSavedStackSize()), TII,
+                    MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
+
+    // Deallocate fixed objects.
+    emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
+                    StackOffset::getFixed(FixedObject), TII,
+                    MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
+
+    // Deallocate callee-save SVE registers.
+    emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
+                    DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
+                    NeedsWinCFI, &HasWinCFI);
+  } else if (SVEStackSize) {
     // If we have stack realignment or variable sized objects on the stack,
     // restore the stack pointer from the frame pointer prior to SVE CSR
     // restoration.
@@ -2450,20 +2546,20 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
         emitFrameOffset(
             MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
             StackOffset::getFixed(NumBytes), TII, MachineInstr::FrameDestroy,
-            false, false, nullptr, EmitCFI && !hasFP(MF),
+            false, NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
             SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
         NumBytes = 0;
       }
 
       emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
                       DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
-                      false, nullptr, EmitCFI && !hasFP(MF),
+                      NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
                       SVEStackSize +
                           StackOffset::getFixed(NumBytes + PrologueSaveSize));
 
       emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
                       DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
-                      false, nullptr, EmitCFI && !hasFP(MF),
+                      NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
                       DeallocateAfter +
                           StackOffset::getFixed(NumBytes + PrologueSaveSize));
     }
@@ -2757,10 +2853,27 @@ StackOffset AArch64FrameLowering::resolveFrameOffsetReference(
   }
 
   StackOffset ScalableOffset = {};
-  if (UseFP && !(isFixed || isCSR))
-    ScalableOffset = -SVEStackSize;
-  if (!UseFP && (isFixed || isCSR))
-    ScalableOffset = SVEStackSize;
+  bool FPAfterSVECalleeSaves =
+      isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+  if (FPAfterSVECalleeSaves) {
+    // In this stack layout, the FP is in between the callee saves and other
+    // SVE allocations.
+    StackOffset SVECalleeSavedStack =
+        StackOffset::getScalable(AFI->getSVECalleeSavedStackSize());
+    if (UseFP) {
+      if (!(isFixed || isCSR))
+        ScalableOffset = SVECalleeSavedStack - SVEStackSize;
+      else
+        ScalableOffset = SVECalleeSavedStack;
+    } else if (!UseFP && (isFixed || isCSR)) {
+      ScalableOffset = SVEStackSize;
+    }
+  } else {
+    if (UseFP && !(isFixed || isCSR))
+      ScalableOffset = -SVEStackSize;
+    if (!UseFP && (isFixed || isCSR))
+      ScalableOffset = SVEStackSize;
+  }
 
   if (UseFP) {
     FrameReg = RegInfo->getFrameRegister(MF);
@@ -2934,7 +3047,9 @@ static void computeCalleeSaveRegisterPairs(
     RegInc = -1;
     FirstReg = Count - 1;
   }
-  int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
+  bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
+  int ScalableByteOffset =
+      FPAfterSVECalleeSaves ? 0 : AFI->getSVECalleeSavedStackSize();
   bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
   Register LastReg = 0;
 
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 1a13adc300d2b..c1ac18fb09180 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -1176,6 +1176,9 @@ bool AArch64InstrInfo::isSEHInstruction(const MachineInstr &MI) {
     case AArch64::SEH_PACSignLR:
     case AArch64::SEH_SaveAnyRegQP:
     case AArch64::SEH_SaveAnyRegQPX:
+    case AArch64::SEH_AllocZ:
+    case AArch64::SEH_SaveZReg:
+    case AArch64::SEH_SavePReg:
       return true;
   }
 }
@@ -5988,10 +5991,16 @@ static void emitFrameOffsetAdj(MachineBasicBlock &MBB,
     }
 
     if (NeedsWinCFI) {
-      assert(Sign == 1 && "SEH directives should always have a positive sign");
       int Imm = (int)(ThisVal << LocalShiftSize);
-      if ((DestReg == AArch64::FP && SrcReg == AArch64::SP) ||
-          (SrcReg == AArch64::FP && DestReg == AArch64::SP)) {
+      if (VScale != 1 && DestReg == AArch64::SP) {
+        if (HasWinCFI)
+          *HasWinCFI = true;
+        BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_AllocZ))
+            .addImm(ThisVal)
+            .setMIFlag(Flag);
+      } else if ((DestReg == AArch64::FP && SrcReg == AArch64::SP) ||
+                 (SrcReg == AArch64::FP && DestReg == AArch64::SP)) {
+        assert(VScale == 1 && "Expected non-scalable operation");
         if (HasWinCFI)
           *HasWinCFI = true;
         if (Imm == 0)
@@ -6003,6 +6012,7 @@ static void emitFrameOffsetAdj(MachineBasicBlock &MBB,
         assert(Offset == 0 && "Expected remaining offset to be zero to "
                               "emit a single SEH directive");
       } else if (DestReg == AArch64::SP) {
+        assert(VScale == 1 && "Expected non-scalable operation");
         if (HasWinCFI)
           *HasWinCFI = true;
         assert(SrcReg == AArch64::SP && "Unexpected SrcReg for SEH_StackAlloc");
@@ -6057,14 +6067,14 @@ void llvm::emitFrameOffset(MachineBasicBlock &MBB,
 
   assert(!(SetNZCV && (NumPredicateVectors || NumDataVectors)) &&
          "SetNZCV not supported with SVE vectors");
-  assert(!(NeedsWinCFI && (NumPredicateVectors || NumDataVectors)) &&
-         "WinCFI not supported with SVE vectors");
+  assert(!(NeedsWinCFI && NumPredicateVectors) &&
+         "WinCFI can't allocate fractions of an SVE data vector");
 
   if (NumDataVectors) {
     emitFrameOffsetAdj(MBB, MBBI, DL, DestReg, SrcReg, NumDataVectors,
-                       UseSVL ? AArch64::ADDSVL_XXI : AArch64::ADDVL_XXI,
-                       TII, Flag, NeedsWinCFI, nullptr, EmitCFAOffset,
-                       CFAOffset, FrameReg);
+                       UseSVL ? AArch64::ADDSVL_XXI : AArch64::ADDVL_XXI, TII,
+                       Flag, NeedsWinCFI, HasWinCFI, EmitCFAOffset, CFAOffset,
+                       FrameReg);
     CFAOffset += StackOffset::getScalable(-NumDataVectors * 16);
     SrcReg = DestReg;
   }
@@ -6072,9 +6082,9 @@ void llvm::emitFrameOffset(MachineBasicBlock &MBB,
   if (NumPredicateVectors) {
     assert(DestReg != AArch64::SP && "Unaligned access to SP");
     emitFrameOffsetAdj(MBB, MBBI, DL, DestReg, SrcReg, NumPredicateVectors,
-                       UseSVL ? AArch64::ADDSPL_XXI : AArch64::ADDPL_XXI,
-                       TII, Flag, NeedsWinCFI, nullptr, EmitCFAOffset,
-                       CFAOffset, FrameReg);
+                       UseSVL ? AArch64::ADDSPL_XXI : AArch64::ADDPL_XXI, TII,
+                       Flag, NeedsWinCFI, HasWinCFI, EmitCFAOffset, CFAOffset,
+                       FrameReg);
   }
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 3962c7eba5833..6165a1ac3e079 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -5408,6 +5408,9 @@ let isPseudo = 1 in {
   def SEH_PACSignLR : Pseudo<(outs), (ins), []>, Sched<[]>;
   def SEH_SaveAnyRegQP : Pseudo<(outs), (ins i32imm:$reg0, i32imm:$reg1, i32imm:$offs), []>, Sched<[]>;
   def SEH_SaveAnyRegQPX : Pseudo<(outs), (ins i32imm:$reg0, i32imm:$reg1, i32imm:$offs), []>, Sched<[]>;
+  def SEH_AllocZ : Pseudo<(outs), (ins i32imm:$offs), []>, Sched<[]>;
+  def SEH_SaveZReg : Pseudo<(outs), (ins i32imm:$reg, i32imm:$offs), []>, Sched<[]>;
+  def SEH_SavePReg : Pseudo<(outs), (ins i32imm:$reg, i32imm:$offs), []>, Sched<[]>;
 }
 
 // Pseudo instructions for Windows EH
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 52b362875b4ef..e9a1b558b2dfe 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -98,6 +98,13 @@ AArch64RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
       return CSR_Win_AArch64_AAPCS_SwiftError_SaveList;
     if (MF->getFunction().getCallingConv() == CallingConv::SwiftTail)
       return CSR_Win_AArch64_AAPCS_SwiftTail_SaveList;
+    if (MF->getFunction(...
[truncated]

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall, thanks! A couple minor inline comments.

assert(!(NeedsWinCFI && (NumPredicateVectors || NumDataVectors)) &&
"WinCFI not supported with SVE vectors");
assert(!(NeedsWinCFI && NumPredicateVectors) &&
"WinCFI can't allocate fractions of an SVE data vector");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to this limitation, the change below on lines 6085-6087 for passing HasWinCFI instead of nullptr is essentially impossible to ever be used, right? But it's of course good to pass HasWinCFI in both cases for consistency anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's not clear what/where is being done, so that when we have one or more predicate vectors to back up, we allocate space for them in the form of full data vectors. For the general calling convention, all of p4-p11 are backed up I presume, but what if we'd be clobbering only one of them in a function where we don't need to back them all up? How is that handled on e.g. Linux today?

Copy link
Member

@MacDue MacDue May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SVE stack size is always rounded up to a 16-byte multiple (implicitly multiplied by vscale), so we always allocate some number of full SVE vectors (which also ensures the stack stays 16-byte aligned). I don't think predicate-sized allocations can occur in frame lowering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right... What I'm asking is, is it possible to hit this assert? Looking at decomposeStackOffsetForFrameOffsets it does seem like it would be possible to end up with a nonzero NumPredicateVectors here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In decomposeStackOffsetForFrameOffsets() if NumPredicateVectors % 8 == 0 (which will be the case if the allocation has been aligned to 16-bytes), it'll change the allocation to be NumDataVectors = NumPredicateVectors / 8 (and no predicate vectors). So I don't think this can be hit from frame lowering. I think it may be used for some other stack allocations (but I think NeedsWinCFI is false for those).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I see - thanks, that answers my question!

; CHECK-NEXT: .seh_allocz 17
; CHECK-NEXT: .seh_endepilogue
; CHECK-NEXT: ret
i64 %n5, i64 %n6, i64 %n7, i64 %n8, i64 %n9) personality ptr @__CxxFrameHandler3 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This IR line has been split with the CHECK lines in the middle of the line - this is quite confusing. Is it possible to join the line without the update script turning it back into this form again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

; CHECK-NEXT: .seh_endproc
call void asm "", "~{d8}"()
ret void
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a testcase with variable arguments (forcing space for homed arguments on the stack) together with SVE registers? If I understood some code comment correctly, this is taken into account somehow, but it would be good to cover it with tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. (This space is allocated by getFixedObjectSize(). f6 actually contains the same kind of allocation, although it's maybe not obvious at first glance.)

@@ -1982,6 +2039,8 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
: 0;

if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
if (AFI->getSVECalleeSavedStackSize())
report_fatal_error("SVE callee saves not yet supported");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the error message a little more descriptive here? SVE callee saves are supported, but not in conjunction with stack probes, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to investigate the interaction between stack probing and SVE stack allocation in general. See also #73975 .

Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments:

ScalableOffset = -SVEStackSize;
if (!UseFP && (isFixed || isCSR))
ScalableOffset = SVEStackSize;
bool FPAfterSVECalleeSaves =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getFrameIndexReferenceFromSP also needs to be updated to handle this layout (currently -pass-remarks-analysis=stack-frame-layout reports the normal layout).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think resolveFrameOffsetReference will also be incorrect for SVE CSs, not sure if that's an issue, but I think there should at least be an assertion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed getFrameIndexReferenceFromSP; still need to add a test. Haven't looked at resolveFrameOffsetReference for SVE CSs yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for stack-frame-layout. Added the assert for resolveFrameOffsetReference (as you note, a frame index should never refer to a CSR in this context).

ScalableOffset = -SVEStackSize;
if (!UseFP && (isFixed || isCSR))
ScalableOffset = SVEStackSize;
bool FPAfterSVECalleeSaves =
Copy link
Member

@MacDue MacDue May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "frame-pointer"="all", I think the addresses of SVE locals are incorrect. For example, the f4 test function resolves the offset as:

	sub	x1, x29, #8
	addvl	x1, x1, #-18

Which is subtracting the size of the SVE callee-saves that are above the FP.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link

github-actions bot commented May 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@efriedma-quic efriedma-quic force-pushed the aarch64-seh-sve-backend-v2 branch from b1d8fff to 76c64ce Compare May 16, 2025 00:33
if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
if (FPAfterSVECalleeSaves &&
-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize())
return StackOffset::get(0, ObjectOffset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
return StackOffset::get(0, ObjectOffset);
return StackOffset::getScalable(ObjectOffset);

Comment on lines 2882 to 2885
else
ScalableOffset = SVECalleeSavedStack;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this offset is correct for fixed stack objects (which are still above the SVE callee saves), but I don't think this works for (fixed) callee-saves, which are now below them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think of any way to write a test involving callee-saves in resolveFrameOffsetReference? I think maybe debug info calls into this code, but I don't think the result is relevant for the actual debug info. And I don't think anything else refers to callee-saves like this: the prologue/epilogue code doesn't resolve offsets like this, and nothing else has any reason to refer to them.

I guess the fix is to change the isFixed || isCSR to just isFixed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the assert assert(!(AFI->getSVECalleeSavedStackSize() && isCSR)); and it seems this can get hit by AArch64RegisterInfo::eliminateFrameIndex, which seems to need the correct offset. It only fails for one test though (llvm/test/CodeGen/AArch64/stack-hazard.ll, @svecc_csr_d8_allocd).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a fix.

The frame index in question is not actually a CSR, but there's open space in the CSR range due to alignment, so it gets allocated as part of the "CSR" area. And the calculation was in fact broken in this case.

else
ScalableOffset = SVECalleeSavedStack;
} else if (!UseFP && (isFixed || isCSR)) {
ScalableOffset = SVEStackSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this looks like it could be incorrect for fixed CSRs, which are SVEStackSize - SVECalleeSavedStack above the SP.

@efriedma-quic
Copy link
Collaborator Author

@MacDue ping

@MacDue
Copy link
Member

MacDue commented May 27, 2025

@MacDue ping

I'll take another careful look look tomorrow 👍

Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mostly looks okay (I can't spot anything broken 🙂 -- but this is pretty tricky code). It is missing some of the testing we have for the non-Windows code path, e.g., https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AArch64/framelayout-sve.mir, which includes tests with fixed-stack objects (which I don't think are tested by win-sve.ll). It may be worth making some equivalent tests for Windows.

Most bits of this are straightforward: when we emit SVE instructions in
the prologue/epilogue, emit corresponding opcodes.

The unfortunately nasty bit is the handling of the frame pointer in
functions that use the SVE calling convention.  If we have SVE
callee saves, and need to restore the stack pointer from the frame
pointer, it's impossible to encode callee saves that happen after the
frame pointer. So this patch rearranges the stack to put SVE callee
saves first.  This isn't really that complicated on its own, but it
leads to a lot of tricky conditionals (see FPAfterSVECalleeSaves).
Previous version didn't non-CSR SVE allocations correctly.
Fixed issue lower tailcc and related constructs.

Code style changes.
@efriedma-quic efriedma-quic force-pushed the aarch64-seh-sve-backend-v2 branch from 1c4aaa1 to 1814b32 Compare May 28, 2025 22:49
Add more tests.  Correctly spill p12-p15.  Fix issue spilling an odd
number of predicates.
Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing my comments 👍. I'm not entirely certain that everything is exactly right here (I don't personally work on Windows targets), but I don't think this should have any impact on SVE for non-Windows targets, so in that sense, I consider this a fairly safe change.

Please wait a few days to see if anyone else has comments before landing.

// We could consider rearranging the spills for simpler cases.
bool FPAfterSVECalleeSaves =
Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a check that this is not used with SME hazard padding? With this layout the current locations of the hazard padding don't make sense (and I believe would require an extra hazard padding slot).

Suggested change
if (AFI->hasStackHazardSlotIndex())
report_fatal_error("SME hazard padding is not supported on Windows");

@efriedma-quic efriedma-quic merged commit 6f64a60 into llvm:main Jun 3, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 3, 2025

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/16930

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/4/12 (3223 of 3232)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/5/12 (3224 of 3232)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/6/12 (3225 of 3232)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/7/12 (3226 of 3232)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/8/12 (3227 of 3232)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/9/12 (3228 of 3232)
PASS: lldb-unit :: tools/lldb-server/tests/./LLDBServerTests/0/2 (3229 of 3232)
PASS: lldb-unit :: tools/lldb-server/tests/./LLDBServerTests/1/2 (3230 of 3232)
PASS: lldb-unit :: Process/gdb-remote/./ProcessGdbRemoteTests/8/35 (3231 of 3232)
TIMEOUT: lldb-api :: tools/lldb-dap/module/TestDAP_module.py (3232 of 3232)
******************** TEST 'lldb-api :: tools/lldb-dap/module/TestDAP_module.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/tools/lldb-dap/module -p TestDAP_module.py
--
Exit Code: -9
Timeout: Reached timeout of 600 seconds

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 6f64a600649a95141526043b170aa4244d54c94b)
  clang revision 6f64a600649a95141526043b170aa4244d54c94b
  llvm revision 6f64a600649a95141526043b170aa4244d54c94b

--
Command Output (stderr):
--
========= DEBUG ADAPTER PROTOCOL LOGS =========
1748915062.988560677 --> (stdio) {"command":"initialize","type":"request","arguments":{"adapterID":"lldb-native","clientID":"vscode","columnsStartAt1":true,"linesStartAt1":true,"locale":"en-us","pathFormat":"path","supportsRunInTerminalRequest":true,"supportsVariablePaging":true,"supportsVariableType":true,"supportsStartDebuggingRequest":true,"supportsProgressReporting":true,"$__lldb_sourceInitFile":false},"seq":1}
1748915062.992622614 <-- (stdio) {"body":{"$__lldb_version":"lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 6f64a600649a95141526043b170aa4244d54c94b)\n  clang revision 6f64a600649a95141526043b170aa4244d54c94b\n  llvm revision 6f64a600649a95141526043b170aa4244d54c94b","completionTriggerCharacters":["."," ","\t"],"exceptionBreakpointFilters":[{"default":false,"filter":"cpp_catch","label":"C++ Catch"},{"default":false,"filter":"cpp_throw","label":"C++ Throw"},{"default":false,"filter":"objc_catch","label":"Objective-C Catch"},{"default":false,"filter":"objc_throw","label":"Objective-C Throw"}],"supportTerminateDebuggee":true,"supportsBreakpointLocationsRequest":true,"supportsCancelRequest":true,"supportsCompletionsRequest":true,"supportsConditionalBreakpoints":true,"supportsConfigurationDoneRequest":true,"supportsDataBreakpoints":true,"supportsDelayedStackTraceLoading":true,"supportsDisassembleRequest":true,"supportsEvaluateForHovers":true,"supportsExceptionInfoRequest":true,"supportsExceptionOptions":true,"supportsFunctionBreakpoints":true,"supportsHitConditionalBreakpoints":true,"supportsInstructionBreakpoints":true,"supportsLogPoints":true,"supportsModulesRequest":true,"supportsReadMemoryRequest":true,"supportsRestartRequest":true,"supportsSetVariable":true,"supportsStepInTargetsRequest":true,"supportsSteppingGranularity":true,"supportsValueFormattingOptions":true},"command":"initialize","request_seq":1,"seq":0,"success":true,"type":"response"}
1748915062.993399858 --> (stdio) {"command":"launch","type":"request","arguments":{"program":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/module/TestDAP_module.test_compile_units/a.out","initCommands":["settings clear --all","settings set symbols.enable-external-lookup false","settings set target.inherit-tcc true","settings set target.disable-aslr false","settings set target.detach-on-error false","settings set target.auto-apply-fixits false","settings set plugin.process.gdb-remote.packet-timeout 60","settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"","settings set use-color false","settings set show-statusline false"],"disableASLR":false,"enableAutoVariableSummaries":false,"enableSyntheticChildDebugging":false,"displayExtendedBacktrace":false},"seq":2}
1748915062.993858814 <-- (stdio) {"body":{"category":"console","output":"Running initCommands:\n"},"event":"output","seq":0,"type":"event"}
1748915062.993904114 <-- (stdio) {"body":{"category":"console","output":"(lldb) settings clear --all\n"},"event":"output","seq":0,"type":"event"}
1748915062.993918896 <-- (stdio) {"body":{"category":"console","output":"(lldb) settings set symbols.enable-external-lookup false\n"},"event":"output","seq":0,"type":"event"}
1748915062.993931770 <-- (stdio) {"body":{"category":"console","output":"(lldb) settings set target.inherit-tcc true\n"},"event":"output","seq":0,"type":"event"}
1748915062.993944407 <-- (stdio) {"body":{"category":"console","output":"(lldb) settings set target.disable-aslr false\n"},"event":"output","seq":0,"type":"event"}
1748915062.993956089 <-- (stdio) {"body":{"category":"console","output":"(lldb) settings set target.detach-on-error false\n"},"event":"output","seq":0,"type":"event"}
1748915062.993968010 <-- (stdio) {"body":{"category":"console","output":"(lldb) settings set target.auto-apply-fixits false\n"},"event":"output","seq":0,"type":"event"}
1748915062.993980408 <-- (stdio) {"body":{"category":"console","output":"(lldb) settings set plugin.process.gdb-remote.packet-timeout 60\n"},"event":"output","seq":0,"type":"event"}
1748915062.995073557 <-- (stdio) {"body":{"category":"console","output":"(lldb) settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"\n"},"event":"output","seq":0,"type":"event"}
1748915062.995107174 <-- (stdio) {"body":{"category":"console","output":"(lldb) settings set use-color false\n"},"event":"output","seq":0,"type":"event"}
1748915062.995119810 <-- (stdio) {"body":{"category":"console","output":"(lldb) settings set show-statusline false\n"},"event":"output","seq":0,"type":"event"}
1748915063.165354490 <-- (stdio) {"command":"launch","request_seq":2,"seq":0,"success":true,"type":"response"}
1748915063.165432930 <-- (stdio) {"event":"initialized","seq":0,"type":"event"}
1748915063.165459871 <-- (stdio) {"body":{"module":{"addressRange":"0xf77df000","debugInfoSize":"983.3KB","id":"0D794E6C-AF7E-D8CB-B9BA-E385B4F8753F-5A793D65","name":"ld-linux-armhf.so.3","path":"/usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3","symbolFilePath":"/usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3","symbolStatus":"Symbols loaded."},"reason":"new"},"event":"module","seq":0,"type":"event"}
1748915063.165681839 <-- (stdio) {"body":{"module":{"addressRange":"0x7e0000","debugInfoSize":"1.1KB","id":"F11D97F7","name":"a.out","path":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/module/TestDAP_module.test_compile_units/a.out","symbolFilePath":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/module/TestDAP_module.test_compile_units/a.out","symbolStatus":"Symbols loaded."},"reason":"new"},"event":"module","seq":0,"type":"event"}
1748915063.166405439 --> (stdio) {"command":"setBreakpoints","type":"request","arguments":{"source":{"name":"main.cpp","path":"main.cpp"},"sourceModified":false,"lines":[5],"breakpoints":[{"line":5}]},"seq":3}
1748915063.178709984 <-- (stdio) {"body":{"breakpoints":[{"column":3,"id":1,"instructionReference":"0x7F073C","line":5,"source":{"name":"main.cpp","path":"main.cpp"},"verified":true}]},"command":"setBreakpoints","request_seq":3,"seq":0,"success":true,"type":"response"}
1748915063.179032803 --> (stdio) {"command":"configurationDone","type":"request","arguments":{},"seq":4}

sallto pushed a commit to sallto/llvm-project that referenced this pull request Jun 3, 2025
…38609)

Most bits of this are straightforward: when we emit SVE instructions in
the prologue/epilogue, emit corresponding opcodes.

The unfortunately nasty bit is the handling of the frame pointer in
functions that use the SVE calling convention. If we have SVE callee
saves, and need to restore the stack pointer from the frame pointer,
it's impossible to encode callee saves that happen after the frame
pointer. So this patch rearranges the stack to put SVE callee saves
first. This isn't really that complicated on its own, but it leads to a
lot of tricky conditionals (see FPAfterSVECalleeSaves).
@mstorsjo
Copy link
Member

mstorsjo commented Jun 4, 2025

With this merged, I run into issues with projects that detect support for SVE, which previously bailed out.

The main issue is hitting issues with codeview debug info; I saw that you have a hack for this at 5e3fa84. You'd need to extend this to cover the predicate registers too; I'm hitting this with P0 in one repro.

If compiling without optimization, I'm hitting another assert relating to codeview debug info too - I presume this is the other unresolved detail you mentioned in the commit message in that hack.

Repro:

typedef __attribute__((neon_vector_type(16))) signed char int8x16_t;
typedef __SVInt8_t svint8_t;
typedef __SVBool_t svbool_t;
__attribute__((__clang_arm_builtin_alias(__builtin_sve_svdup_n_s8))) svint8_t
svdup_n_s8(char);
void svsel_s8(svbool_t, svint8_t, svint8_t);
__attribute__((__clang_arm_builtin_alias(__builtin_sve_svundef_s8))) svint8_t
svundef_s8();
svbool_t svwhilelt_b8(int, int);
__attribute__((__clang_arm_builtin_alias(__builtin_sve_svset_neonq_s8)))
svint8_t svset_neonq_s8(svint8_t, int8x16_t);
int8x16_t saoCuStatsE1_sve2_endY_in;
void saoCuStatsE1_sve2_endY() {
  svbool_t svpred = svwhilelt_b8(0, 0);
  svsel_s8(svpred, svset_neonq_s8(svundef_s8(), saoCuStatsE1_sve2_endY_in),
           svdup_n_s8(3));
}
$ clang -target aarch64-windows-msvc -c repro.c -march=armv9-a -g -O2
fatal error: error in backend: unknown codeview register P0
$ clang -target aarch64-windows-msvc -c repro.c -march=armv9-a -g
clang: ../lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:1281: void llvm::CodeViewDebug::collectVariableInfoFromMFTable(llvm::DenseSet<std::pair<const llvm::DINode*, const llvm::DILocation*> >&): Assertion `!FrameOffset.getScalable() && "Frame offsets with a scalable component are not supported"' failed.

@efriedma-quic
Copy link
Collaborator Author

Those two crashes are the ones I've seen, yes, and are addressed by the hack.

The issue is that the projects detect SVE without debug info, then try to build with debug info?

So... I have the following options here right now:

  • Add a temporary disabled-by-default flag for SVE on Windows, so projects can't detect the SVE support.
  • Add some form of debug info hack: either intentionally add incorrect mappings, or somehow drop the bits of debug info that would depend on those mappings.

Longer-term, we're reaching out to Microsoft for more information here, but not sure how long that will take.

@mstorsjo
Copy link
Member

mstorsjo commented Jun 4, 2025

Those two crashes are the ones I've seen, yes, and are addressed by the hack.

Oh, right, the other issue is also silenced by the hack - sorry, I had missed that.

The issue is that the projects detect SVE without debug info, then try to build with debug info?

In this case, I think they do apply the same debug info options both when detecting and building, the problem is that the issue with debug info doesn't show up on a trivial sized detection snippet. I did contribute a big enough snippet that did trigger when we don't support generating unwind info, https://aomedia.googlesource.com/aom/+/5ccdc66ab6eb8eb300eda854fab4ff250b2c2f92%5E%21/, but it doesn't trigger the issues with debug info.

So... I have the following options here right now:

  • Add a temporary disabled-by-default flag for SVE on Windows, so projects can't detect the SVE support.
  • Add some form of debug info hack: either intentionally add incorrect mappings, or somehow drop the bits of debug info that would depend on those mappings.

Longer-term, we're reaching out to Microsoft for more information here, but not sure how long that will take.

I can work around it in my test setups temporarily by not enabling codeview debug info for aarch64 windows, but by the time of the 21.x branch (which should be in 5-7 weeks roughly?) I think it would be good to apply either of those kinds of fixes - no strong opinion on which one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants