Skip to content

[llvm-exegesis] [AArch64] Resolving "snippet crashed while running: Segmentation fault" for Load Instructions #142552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

lakshayk-nv
Copy link
Contributor

@lakshayk-nv lakshayk-nv commented Jun 3, 2025

We want to support load and store instructions for aarch64 but currently they throw segmentation fault as register are initialized by 0 but load instruction requires register to be loaded with valid memory address.
This is a WIP patch and not expecting to merging it in current state but to get feedback.

Load registers requiring memory address

There are possibly two ways to support load instructions (setting registers with valid memory address):-

1. With address to auxiliary mmap:

Prerequisite for this is to support --mode=subprocess for AArch64.

Adding support for --mode=subprocess and Memory Annotation of manual snippet.

  • Adding memory setup, required by subprocess for AArch64.
  • Implement Auxiliary memory mmap, manual snippet mmap and configurePerfCounter().
  • Functions for syscall generations.

Generating syscall for aux mmap, save its return value in stack and load required registers with memory address.
Implemented the same, currently.

TODO: how to differentiate between register requiring address and otherwise ?

For example: LD1B opcode (ld1b    {z6.b}, p4/z, [x14, x2]) expects first register to contain address and second having offset value.
Temporary fix: Init first register queried by instruction to loaded by address and rest by setRegTo as done previously.

2. Utilize fillMemoryOperands

Found fillMemoryOperands() used by x86 implementation, seems relevant to init registers required by load instructions.
Implementation for fillMemoryOperands(), getScratchMemoryRegister() is missing for AArch64.
Firstly, Codeflow check for IsMemory (i.e. OPERAND_MEMORY) which is not relevant for AArch64.
Thus, Experimentally added IsMemory to OR mayLoadOrStore too (MCInstrDescView.cpp)

TODO: Implement getScratchMemoryRegister() correctly

  • return MCRegister() result in register to not be valid and exit with "Infeasible : target does not support memory instructions"
  • return X14, any hardcode register. Results in illegal instruction is generated: undefined physical register.

TODO: Implement fillMemoryOperands

Please review: @sjoerdmeijer, @boomanaiden154, @davemgreen
Looking forward for feedback.
Thanks,

@llvmbot
Copy link
Member

llvmbot commented Jun 3, 2025

@llvm/pr-subscribers-tools-llvm-exegesis

Author: Lakshay Kumar (lakshayk-nv)

Changes

We want to support load and store instructions for aarch64 but currently they throw segmentation fault as register are initialized by 0 but load instruction requires register to be loaded with valid memory address.
This is a WIP patch and not expecting to merging it in current state but to get feedback.

A. Prerequisite for this is to support --mode=subprocess for AArch64.

Adding support for --mode=subprocess and Memory Annotation of manual snippet.

  • Adding memory setup, required by subprocess for AArch64.
  • Implement Auxiliary memory mmap, manual snippet mmap and configurePerfCounter().
  • Functions for syscall generations.

B. Load registers requiring memory address

There are possibly two ways to support load instructions (setting registers with valid memory address):-

1. With address to auxiliary mmap:

Generating syscall for aux mmap, save its return value in stack and load required registers with memory address.
Implemented the same, currently.

TODO: how to differentiate between register requiring address and otherwise ?

For example: LD1B opcode (ld1b    {z6.b}, p4/z, [x14, x2]) expects first register to contain address and second having offset value.
Temporary fix: Init first register queried by instruction to loaded by address and rest by setRegTo as done previously.

2. Utilize fillMemoryOperands

Found fillMemoryOperands() used by x86 implementation, seems relevant to init registers required by load instructions.
Implementation for fillMemoryOperands(), getScratchMemoryRegister() is missing for AArch64.
Firstly, Codeflow check for IsMemory (i.e. OPERAND_MEMORY) which is not relevant for AArch64.
Thus, Experimentally added IsMemory to OR mayLoadOrStore too (MCInstrDescView.cpp)

TODO: Implement getScratchMemoryRegister() correctly

  • return MCRegister() result in register to not be valid and exit with "Infeasible : target does not support memory instructions"
  • return X14, any hardcode register. Results in illegal instruction is generated: undefined physical register.

TODO: Implement fillMemoryOperands

Looking forward for feedback.
Thanks,


Patch is 32.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142552.diff

7 Files Affected:

  • (modified) llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp (+504-1)
  • (modified) llvm/tools/llvm-exegesis/lib/Assembler.cpp (+35-1)
  • (modified) llvm/tools/llvm-exegesis/lib/MCInstrDescView.cpp (+15-6)
  • (modified) llvm/tools/llvm-exegesis/lib/MCInstrDescView.h (+1)
  • (modified) llvm/tools/llvm-exegesis/lib/SerialSnippetGenerator.cpp (+6)
  • (modified) llvm/tools/llvm-exegesis/lib/SnippetGenerator.cpp (+8)
  • (modified) llvm/tools/llvm-exegesis/lib/Target.h (+4)
diff --git a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
index a1eb5a46f21fc..48a22d011a491 100644
--- a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
+++ b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
@@ -6,10 +6,26 @@
 //
 //===----------------------------------------------------------------------===//
 #include "../Target.h"
+#include "../Error.h"
+#include "../MmapUtils.h"
+#include "../SerialSnippetGenerator.h"
+#include "../SnippetGenerator.h"
+#include "../SubprocessMemory.h"
 #include "AArch64.h"
 #include "AArch64RegisterInfo.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/MC/MCInstBuilder.h"
+#include "llvm/MC/MCRegisterInfo.h"
+#include <vector>
 
+#define DEBUG_TYPE "exegesis-aarch64-target"
 #if defined(__aarch64__) && defined(__linux__)
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <unistd.h> // for getpagesize()
+#ifdef HAVE_LIBPFM
+#include <perfmon/perf_event.h>
+#endif                   // HAVE_LIBPFM
 #include <linux/prctl.h> // For PR_PAC_* constants
 #include <sys/prctl.h>
 #ifndef PR_PAC_SET_ENABLED_KEYS
@@ -120,7 +136,7 @@ static MCInst loadPPRImmediate(MCRegister Reg, unsigned RegBitWidth,
 // Generates instructions to load an immediate value into an FPCR register.
 static std::vector<MCInst>
 loadFPCRImmediate(MCRegister Reg, unsigned RegBitWidth, const APInt &Value) {
-  MCRegister TempReg = AArch64::X8;
+  MCRegister TempReg = AArch64::X16;
   MCInst LoadImm = MCInstBuilder(AArch64::MOVi64imm).addReg(TempReg).addImm(0);
   MCInst MoveToFPCR =
       MCInstBuilder(AArch64::MSR).addImm(AArch64SysReg::FPCR).addReg(TempReg);
@@ -153,6 +169,89 @@ static MCInst loadFPImmediate(MCRegister Reg, unsigned RegBitWidth,
   return Instructions;
 }
 
+static void generateRegisterStackPush(unsigned int RegToPush,
+                                      std::vector<MCInst> &GeneratedCode,
+                                      int imm = -16) {
+  // STR [X|W]t, [SP, #simm]!: SP is decremented by default 16 bytes
+  //                           before the store to maintain 16-bytes alignment.
+  if (AArch64::GPR64RegClass.contains(RegToPush)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::STRXpre)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPush)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else if (AArch64::GPR32RegClass.contains(RegToPush)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::STRWpre)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPush)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else {
+    llvm_unreachable("Unsupported register class for stack push");
+  }
+}
+
+static void generateRegisterStackPop(unsigned int RegToPopTo,
+                                     std::vector<MCInst> &GeneratedCode,
+                                     int imm = 16) {
+  // LDR Xt, [SP], #simm: SP is incremented by default 16 bytes after the load.
+  if (AArch64::GPR64RegClass.contains(RegToPopTo)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::LDRXpost)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPopTo)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else if (AArch64::GPR32RegClass.contains(RegToPopTo)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::LDRWpost)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPopTo)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else {
+    llvm_unreachable("Unsupported register class for stack pop");
+  }
+}
+
+void generateSysCall(long SyscallNumber, std::vector<MCInst> &GeneratedCode) {
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X8, 64, APInt(64, SyscallNumber)));
+  GeneratedCode.push_back(MCInstBuilder(AArch64::SVC).addImm(0));
+}
+
+/// Functions to save/restore system call registers
+#ifdef __linux__
+constexpr std::array<unsigned, 6> SyscallArgumentRegisters{
+    AArch64::X0, AArch64::X1, AArch64::X2,
+    AArch64::X3, AArch64::X4, AArch64::X5,
+};
+
+static void saveSysCallRegisters(std::vector<MCInst> &GeneratedCode,
+                                 unsigned ArgumentCount) {
+  // AArch64 Linux typically uses X0-X5 for the first 6 arguments.
+  // Some syscalls can take up to 8 arguments in X0-X7.
+  assert(ArgumentCount <= 6 &&
+         "This implementation saves up to 6 argument registers (X0-X5)");
+  // generateRegisterStackPush(AArch64::X16, GeneratedCode);
+  // Preserve X8 (used for the syscall number/return value).
+  generateRegisterStackPush(AArch64::X8, GeneratedCode);
+  // Preserve the registers used to pass arguments to the system call.
+  for (unsigned I = 0; I < ArgumentCount; ++I) {
+    generateRegisterStackPush(SyscallArgumentRegisters[I], GeneratedCode);
+  }
+}
+
+static void restoreSysCallRegisters(std::vector<MCInst> &GeneratedCode,
+                                    unsigned ArgumentCount) {
+  assert(ArgumentCount <= 6 &&
+         "This implementation restores up to 6 argument registers (X0-X5)");
+  // Restore argument registers, in opposite order of the way they are saved.
+  for (int I = ArgumentCount - 1; I >= 0; --I) {
+    generateRegisterStackPop(SyscallArgumentRegisters[I], GeneratedCode);
+  }
+  generateRegisterStackPop(AArch64::X8, GeneratedCode);
+  // generateRegisterStackPop(AArch64::X16, GeneratedCode);
+}
+#endif // __linux__
 #include "AArch64GenExegesis.inc"
 
 namespace {
@@ -162,7 +261,44 @@ class ExegesisAArch64Target : public ExegesisTarget {
   ExegesisAArch64Target()
       : ExegesisTarget(AArch64CpuPfmCounters, AArch64_MC::isOpcodeAvailable) {}
 
+  enum ArgumentRegisters {
+    CodeSize = AArch64::X12,
+    AuxiliaryMemoryFD = AArch64::X13
+  };
+
+  std::vector<MCInst> _generateRegisterStackPop(MCRegister Reg,
+                                                int imm = 0) const override {
+    std::vector<MCInst> Insts;
+    if (AArch64::GPR32RegClass.contains(Reg)) {
+      generateRegisterStackPop(Reg, Insts, imm);
+      return Insts;
+    }
+    if (AArch64::GPR64RegClass.contains(Reg)) {
+      generateRegisterStackPop(Reg, Insts, imm);
+      return Insts;
+    }
+    return {};
+  }
+
 private:
+#ifdef __linux__
+  void generateLowerMunmap(std::vector<MCInst> &GeneratedCode) const override;
+  void generateUpperMunmap(std::vector<MCInst> &GeneratedCode) const override;
+  std::vector<MCInst> generateExitSyscall(unsigned ExitCode) const override;
+  std::vector<MCInst>
+  generateMmap(uintptr_t Address, size_t Length,
+               uintptr_t FileDescriptorAddress) const override;
+  void generateMmapAuxMem(std::vector<MCInst> &GeneratedCode) const override;
+  void moveArgumentRegisters(std::vector<MCInst> &GeneratedCode) const override;
+  std::vector<MCInst> generateMemoryInitialSetup() const override;
+  std::vector<MCInst> setStackRegisterToAuxMem() const override;
+  uintptr_t getAuxiliaryMemoryStartAddress() const override;
+  std::vector<MCInst> configurePerfCounter(long Request,
+                                           bool SaveRegisters) const override;
+  std::vector<MCRegister> getArgumentRegisters() const override;
+  std::vector<MCRegister> getRegistersNeedSaving() const override;
+#endif // __linux__
+
   std::vector<MCInst> setRegTo(const MCSubtargetInfo &STI, MCRegister Reg,
                                const APInt &Value) const override {
     if (AArch64::GPR32RegClass.contains(Reg))
@@ -227,10 +363,377 @@ class ExegesisAArch64Target : public ExegesisTarget {
 
     return nullptr;
   }
+  MCRegister getScratchMemoryRegister(const Triple &) const override;
+  void fillMemoryOperands(InstructionTemplate &IT, MCRegister Reg,
+                          unsigned Offset) const override;
 };
 
 } // namespace
 
+// Implementation follows RISCV pattern for memory operand handling.
+// Note: This implementation requires validation for AArch64-specific
+// requirements.
+void ExegesisAArch64Target::fillMemoryOperands(InstructionTemplate &IT,
+                                               MCRegister Reg,
+                                               unsigned Offset) const {
+  LLVM_DEBUG(dbgs() << "Executing fillMemoryOperands");
+  // AArch64 memory operands typically have the following structure:
+  // [base_register, offset]
+  auto &I = IT.getInstr();
+  auto MemOpIt =
+      find_if(I.Operands, [](const Operand &Op) { return Op.isMemory(); });
+  assert(MemOpIt != I.Operands.end() &&
+         "Instruction must have memory operands");
+
+  const Operand &MemOp = *MemOpIt;
+
+  assert(MemOp.isReg() && "Memory operand expected to be register");
+
+  IT.getValueFor(MemOp) = MCOperand::createReg(Reg);
+  IT.getValueFor(MemOp) = MCOperand::createImm(Offset);
+}
+enum ScratchMemoryRegister {
+  Z = AArch64::Z14,
+  X = AArch64::X14,
+  W = AArch64::W14,
+};
+
+MCRegister
+ExegesisAArch64Target::getScratchMemoryRegister(const Triple &TT) const {
+  // return MCRegister();   // Implemented in target.h
+  // return hardcoded scratch memory register, similar to RISCV (uses a0)
+  return ScratchMemoryRegister::X;
+}
+
+#ifdef __linux__
+// true : let use of fixed address to Virtual Address Space Ceiling
+// false: let kernel choose the address of the auxiliary memory
+bool UseFixedAddress = true; // TODO: Remove this later
+
+static constexpr const uintptr_t VAddressSpaceCeiling = 0x0000800000000000;
+
+static void generateRoundToNearestPage(unsigned int TargetRegister,
+                                       std::vector<MCInst> &GeneratedCode) {
+  int PageSizeShift = static_cast<int>(round(log2(getpagesize())));
+  // Round down to the nearest page by getting rid of the least significant bits
+  // representing location in the page.
+
+  // Single instruction using AND with inverted mask (effectively BIC)
+  uint64_t BitsToClearMask = (1ULL << PageSizeShift) - 1; // 0xFFF
+  uint64_t AndMask = ~BitsToClearMask;                    // ...FFFFFFFFFFFF000
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ANDXri)
+                              .addReg(TargetRegister) // Xd
+                              .addReg(TargetRegister) // Xn
+                              .addImm(AndMask)        // imm bitmask
+  );
+}
+static void generateGetInstructionPointer(unsigned int ResultRegister,
+                                          std::vector<MCInst> &GeneratedCode) {
+  // ADR X[ResultRegister], . : loads address of current instruction
+  // ADR : Form PC-relative address
+  // This instruction adds an immediate value to the PC value to form a
+  // PC-relative address, and writes the result to the destination register.
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ADR)
+                              .addReg(ResultRegister) // Xd
+                              .addImm(0));            // Offset
+}
+
+// TODO: This implementation mirrors the x86 version and requires validation.
+// The purpose of this memory unmapping needs to be verified for AArch64
+void ExegesisAArch64Target::generateLowerMunmap(
+    std::vector<MCInst> &GeneratedCode) const {
+  // Unmap starting at address zero
+  GeneratedCode.push_back(loadImmediate(AArch64::X0, 64, APInt(64, 0)));
+  // Get the current instruction pointer so we know where to unmap up to.
+  generateGetInstructionPointer(AArch64::X1, GeneratedCode);
+  generateRoundToNearestPage(AArch64::X1, GeneratedCode);
+  // Subtract a page from the end of the unmap so we don't unmap the currently
+  // executing section.
+  long page_size = getpagesize();
+  // Load page_size into a temporary register (e.g., X16)
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X16, 64, APInt(64, page_size)));
+  // Subtract X16 (containing page_size) from X1
+  GeneratedCode.push_back(MCInstBuilder(AArch64::SUBXrr)
+                              .addReg(AArch64::X1)    // Dest
+                              .addReg(AArch64::X1)    // Src
+                              .addReg(AArch64::X16)); // page_size
+  generateSysCall(SYS_munmap, GeneratedCode);
+}
+
+// FIXME: This implementation mirrors the x86 version and requires validation.
+// The purpose of this memory unmapping needs to be verified for AArch64
+// The correctness of this implementation needs to be verified.
+void ExegesisAArch64Target::generateUpperMunmap(
+    std::vector<MCInst> &GeneratedCode) const {
+  generateGetInstructionPointer(AArch64::X4, GeneratedCode);
+  // Load the size of the snippet from the argument register into X0
+  // FIXME: Argument register seems not be initialized.
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ORRXrr)
+                              .addReg(AArch64::X0)
+                              .addReg(AArch64::XZR)
+                              .addReg(ArgumentRegisters::CodeSize));
+  // Add the length of the snippet (in X0) to the current instruction pointer
+  // (in X4) to get the address where we should start unmapping at.
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ADDXrr)
+                              .addReg(AArch64::X0)
+                              .addReg(AArch64::X0)
+                              .addReg(AArch64::X4));
+  generateRoundToNearestPage(AArch64::X0, GeneratedCode);
+  // Add one page to the start address to ensure the address is above snippet.
+  // Since the above function rounds down.
+  long page_size = getpagesize();
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X16, 64, APInt(64, page_size)));
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ADDXrr)
+                              .addReg(AArch64::X0)    // Dest
+                              .addReg(AArch64::X0)    // Src
+                              .addReg(AArch64::X16)); // page_size
+  // Unmap to just one page under the ceiling of the address space.
+  GeneratedCode.push_back(loadImmediate(
+      AArch64::X1, 64, APInt(64, VAddressSpaceCeiling - getpagesize())));
+  GeneratedCode.push_back(MCInstBuilder(AArch64::SUBXrr)
+                              .addReg(AArch64::X1)
+                              .addReg(AArch64::X1)
+                              .addReg(AArch64::X0));
+  generateSysCall(SYS_munmap, GeneratedCode); // SYS_munmap is 215
+}
+
+std::vector<MCInst>
+ExegesisAArch64Target::generateExitSyscall(unsigned ExitCode) const {
+  std::vector<MCInst> ExitCallCode;
+  ExitCallCode.push_back(loadImmediate(AArch64::X0, 64, APInt(64, ExitCode)));
+  generateSysCall(SYS_exit, ExitCallCode); // SYS_exit is 93
+  return ExitCallCode;
+}
+
+// FIXME: This implementation mirrors the x86 version and requires validation.
+// The correctness of this implementation needs to be verified.
+// mmap(address, length, prot, flags, fd, offset=0)
+std::vector<MCInst>
+ExegesisAArch64Target::generateMmap(uintptr_t Address, size_t Length,
+                                    uintptr_t FileDescriptorAddress) const {
+  int flags = MAP_SHARED;
+  if (Address != 0) {
+    flags |= MAP_FIXED_NOREPLACE;
+  }
+  std::vector<MCInst> MmapCode;
+  MmapCode.push_back(
+      loadImmediate(AArch64::X0, 64, APInt(64, Address))); // map adr
+  MmapCode.push_back(
+      loadImmediate(AArch64::X1, 64, APInt(64, Length))); // length
+  MmapCode.push_back(loadImmediate(AArch64::X2, 64,
+                                   APInt(64, PROT_READ | PROT_WRITE))); // prot
+  MmapCode.push_back(loadImmediate(AArch64::X3, 64, APInt(64, flags))); // flags
+  // FIXME: File descriptor address is not initialized.
+  // Copy file descriptor location from aux memory into X4
+  MmapCode.push_back(
+      loadImmediate(AArch64::X4, 64, APInt(64, FileDescriptorAddress))); // fd
+  // // Dereference file descriptor into FD argument register (TODO: Why? &
+  // correct?) MmapCode.push_back(
+  //   MCInstBuilder(AArch64::LDRWui)
+  //       .addReg(AArch64::W4)   // Destination register
+  //       .addReg(AArch64::X4)   // Base register (address)
+  //       .addImm(0)             // Offset (in 4-byte words, so 0 means no
+  //       offset)
+  // );
+  MmapCode.push_back(loadImmediate(AArch64::X5, 64, APInt(64, 0))); // offset
+  generateSysCall(SYS_mmap, MmapCode); // SYS_mmap is 222
+  return MmapCode;
+}
+
+// FIXME: This implementation mirrors the x86 version and requires validation.
+// The correctness of this implementation needs to be verified.
+void ExegesisAArch64Target::generateMmapAuxMem(
+    std::vector<MCInst> &GeneratedCode) const {
+  int fd = -1;
+  int flags = MAP_SHARED;
+  uintptr_t address = getAuxiliaryMemoryStartAddress();
+  if (fd == -1)
+    flags |= MAP_ANONYMOUS;
+  if (address != 0)
+    flags |= MAP_FIXED_NOREPLACE;
+  int prot = PROT_READ | PROT_WRITE;
+
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X0, 64, APInt(64, address))); // map adr
+  GeneratedCode.push_back(loadImmediate(
+      AArch64::X1, 64,
+      APInt(64, SubprocessMemory::AuxiliaryMemorySize))); // length
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X2, 64, APInt(64, prot))); // prot
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X3, 64, APInt(64, flags))); // flags
+  GeneratedCode.push_back(loadImmediate(AArch64::X4, 64, APInt(64, fd))); // fd
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X5, 64, APInt(64, 0))); // offset
+  generateSysCall(SYS_mmap, GeneratedCode);          // SYS_mmap is 222
+}
+
+void ExegesisAArch64Target::moveArgumentRegisters(
+    std::vector<MCInst> &GeneratedCode) const {
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ORRXrr)
+                              .addReg(ArgumentRegisters::CodeSize)
+                              .addReg(AArch64::XZR)
+                              .addReg(AArch64::X0));
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ORRXrr)
+                              .addReg(ArgumentRegisters::AuxiliaryMemoryFD)
+                              .addReg(AArch64::XZR)
+                              .addReg(AArch64::X1));
+}
+
+std::vector<MCInst> ExegesisAArch64Target::generateMemoryInitialSetup() const {
+  std::vector<MCInst> MemoryInitialSetupCode;
+  // moveArgumentRegisters(MemoryInitialSetupCode);
+  // generateLowerMunmap(MemoryInitialSetupCode);   // TODO: Motivation Unclear
+  // generateUpperMunmap(MemoryInitialSetupCode);   // FIXME: Motivation Unclear
+  // TODO: Revert argument registers value, if munmap is used.
+
+  generateMmapAuxMem(MemoryInitialSetupCode); // FIXME: Uninit file descriptor
+
+  // If using fixed address for auxiliary memory skip this step,
+  // When using dynamic memory allocation (non-fixed address), we must preserve
+  // the mmap return value (X0) which contains the allocated memory address.
+  // This value is saved to the stack to ensure registers requiring memory
+  // access can retrieve the correct address even if X0 is modified by
+  // intermediate code.
+  generateRegisterStackPush(AArch64::X0, MemoryInitialSetupCode);
+  // FIXME: Ensure stack pointer remains stable to prevent loss of saved address
+  return MemoryInitialSetupCode;
+}
+
+// TODO: This implementation mirrors the x86 version and requires validation.
+// The purpose of moving stack pointer to aux memory needs to be verified for
+// AArch64
+std::vector<MCInst> ExegesisAArch64Target::setStackRegisterToAuxMem() const {
+  return std::vector<MCInst>(); // NOP
+
+  // Below is implementation for AArch64 but motivation unclear
+  // std::vector<MCInst> instructions; // NOP
+  // const uint64_t targetSPValue = getAuxiliaryMemoryStartAddress() +
+  //                               SubprocessMemory::AuxiliaryMemorySize;
+  // // sub, stack args and local storage
+  // // Use X16 as a temporary register since it's a scratch register
+  // const MCRegister TempReg = AArch64::X16;
+
+  // // Load the 64-bit immediate into TempReg using MOVZ/MOVK sequence
+  // // MOVZ Xd, #imm16, LSL #(shift_val * 16)
+  // // MOVK Xd, #imm16, LSL #(shift_val * 16) (* 3 times for 64-bit immediate)
+
+  // // 1. MOVZ TmpReg, #(targetSPValue & 0xFFFF), LSL #0
+  // instructions.push_back(
+  //     MCInstBuilder(AArch64::MOVZXi)
+  //         .addReg(TempReg)
+  //         .addImm(static_cast<uint16_t>(targetSPValue & 0xFFFF)) // imm16
+  //         .addImm(0));                               // hw (shift/16) = 0
+  // // 2. MOVK TmpReg,...
[truncated]

@lakshayk-nv lakshayk-nv changed the title [llvm-exegesis] Resolving "snippet crashed while running: Segmentation fault" for Load Instructions [llvm-exegesis] [AArch64] Resolving "snippet crashed while running: Segmentation fault" for Load Instructions Jun 3, 2025
@boomanaiden154
Copy link
Contributor

A. Prerequisite for this is to support --mode=subprocess for AArch64.

Why exactly did you need to support --mode=subprocess for this? It gives you some extra flexibility, but the scratch memory register/block is a lot simpler.

Copy link

github-actions bot commented Jun 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants