[AMDGPU] Update code object metadata for kernarg preload #134666

kerbowa · 2025-04-07T15:06:57Z

Tracks the registers that explicit and hidden arguments are preloaded to
with new code object metadata.

IR arguments may be split across multiple parts by isel, and SGPR tuple
alignment means that an argument may be spread across multiple
registers.

To support this, some of the utilities for hidden kernel arguments are
moved to AMDGPUArgumentUsageInfo.h. Additional bookkeeping is also
needed for tracking purposes.

kerbowa · 2025-04-07T15:07:16Z

[AMDGPU] Update code object metadata for kernarg preload #134666 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-04-07T15:08:24Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-support

Author: Austin Kerbow (kerbowa)

Changes

Tracks the registers that explicit and hidden arguments are preloaded to
with new code object metadata.

IR arguments may be split across multiple parts by isel, and SGPR tuple
alignment means that an argument may be spread across multiple
registers.

To support this, some of the utilities for hidden kernel arguments are
moved to AMDGPUArgumentUsageInfo.h. Additional bookkeeping is also
needed for tracking purposes.

Patch is 78.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134666.diff

21 Files Affected:

(modified) llvm/include/llvm/Support/AMDGPUMetadata.h (+1-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp (+34)
(modified) llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h (+87-4)
(modified) llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp (+312-59)
(modified) llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h (+26-8)
(modified) llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp (+10-59)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+39-7)
(modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+8-3)
(modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h (+2-2)
(added) llvm/test/CodeGen/AMDGPU/hsa-metadata-preload-args-v6.ll (+388)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+4-3)

diff --git a/llvm/include/llvm/Support/AMDGPUMetadata.h b/llvm/include/llvm/Support/AMDGPUMetadata.h
index 76ac7ab74a32e..d5e0f4031b0f6 100644
--- a/llvm/include/llvm/Support/AMDGPUMetadata.h
+++ b/llvm/include/llvm/Support/AMDGPUMetadata.h
@@ -47,7 +47,7 @@ constexpr uint32_t VersionMinorV5 = 2;
 /// HSA metadata major version for code object V6.
 constexpr uint32_t VersionMajorV6 = 1;
 /// HSA metadata minor version for code object V6.
-constexpr uint32_t VersionMinorV6 = 2;
+constexpr uint32_t VersionMinorV6 = 3;
 
 /// Old HSA metadata beginning assembler directive for V2. This is only used for
 /// diagnostics now.
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp
index d158f0f58d711..06504a081e6f6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp
@@ -16,12 +16,15 @@
 #include "llvm/Support/raw_ostream.h"
 
 using namespace llvm;
+using namespace llvm::KernArgPreload;
 
 #define DEBUG_TYPE "amdgpu-argument-reg-usage-info"
 
 INITIALIZE_PASS(AMDGPUArgumentUsageInfo, DEBUG_TYPE,
                 "Argument Register Usage Information Storage", false, true)
 
+constexpr HiddenArgInfo HiddenArgUtils::HiddenArgs[END_HIDDEN_ARGS];
+
 void ArgDescriptor::print(raw_ostream &OS,
                           const TargetRegisterInfo *TRI) const {
   if (!isSet()) {
@@ -176,6 +179,37 @@ AMDGPUFunctionArgInfo AMDGPUFunctionArgInfo::fixedABILayout() {
   return AI;
 }
 
+SmallVector<const KernArgPreloadDescriptor *, 4>
+AMDGPUFunctionArgInfo::getPreloadDescriptorsForArgIdx(unsigned ArgIdx) const {
+  SmallVector<const KernArgPreloadDescriptor *, 4> Results;
+  for (const auto &KV : PreloadKernArgs) {
+    if (KV.second.OrigArgIdx == ArgIdx)
+      Results.push_back(&KV.second);
+  }
+
+  llvm::stable_sort(Results, [](const KernArgPreloadDescriptor *A,
+                                const KernArgPreloadDescriptor *B) {
+    return A->PartIdx < B->PartIdx;
+  });
+
+  return Results;
+}
+
+std::optional<const KernArgPreloadDescriptor *>
+AMDGPUFunctionArgInfo::getHiddenArgPreloadDescriptor(HiddenArg HA) const {
+  assert(HA < END_HIDDEN_ARGS);
+
+  auto HiddenArgIt = PreloadHiddenArgsIndexMap.find(HA);
+  if (HiddenArgIt == PreloadHiddenArgsIndexMap.end())
+    return std::nullopt;
+
+  auto KernArgIt = PreloadKernArgs.find(HiddenArgIt->second);
+  if (KernArgIt == PreloadKernArgs.end())
+    return std::nullopt;
+
+  return &KernArgIt->second;
+}
+
 const AMDGPUFunctionArgInfo &
 AMDGPUArgumentUsageInfo::lookupFuncArgInfo(const Function &F) const {
   auto I = ArgInfoMap.find(&F);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h
index e07d47381ecca..ee4dba31f2617 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h
@@ -11,7 +11,10 @@
 
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
 #include "llvm/ADT/DenseMap.h"
+#include "llvm/Analysis/ValueTracking.h"
 #include "llvm/CodeGen/Register.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Type.h"
 #include "llvm/Pass.h"
 
 namespace llvm {
@@ -95,11 +98,78 @@ inline raw_ostream &operator<<(raw_ostream &OS, const ArgDescriptor &Arg) {
   return OS;
 }
 
-struct KernArgPreloadDescriptor : public ArgDescriptor {
-  KernArgPreloadDescriptor() {}
-  SmallVector<MCRegister> Regs;
+namespace KernArgPreload {
+
+enum HiddenArg {
+  HIDDEN_BLOCK_COUNT_X,
+  HIDDEN_BLOCK_COUNT_Y,
+  HIDDEN_BLOCK_COUNT_Z,
+  HIDDEN_GROUP_SIZE_X,
+  HIDDEN_GROUP_SIZE_Y,
+  HIDDEN_GROUP_SIZE_Z,
+  HIDDEN_REMAINDER_X,
+  HIDDEN_REMAINDER_Y,
+  HIDDEN_REMAINDER_Z,
+  END_HIDDEN_ARGS
 };
 
+// Stores information about a specific hidden argument.
+struct HiddenArgInfo {
+  // Offset in bytes from the location in the kernearg segment pointed to by
+  // the implicitarg pointer.
+  uint8_t Offset;
+  // The size of the hidden argument in bytes.
+  uint8_t Size;
+  // The name of the hidden argument in the kernel signature.
+  const char *Name;
+};
+
+struct HiddenArgUtils {
+  static constexpr HiddenArgInfo HiddenArgs[END_HIDDEN_ARGS] = {
+      {0, 4, "_hidden_block_count_x"}, {4, 4, "_hidden_block_count_y"},
+      {8, 4, "_hidden_block_count_z"}, {12, 2, "_hidden_group_size_x"},
+      {14, 2, "_hidden_group_size_y"}, {16, 2, "_hidden_group_size_z"},
+      {18, 2, "_hidden_remainder_x"},  {20, 2, "_hidden_remainder_y"},
+      {22, 2, "_hidden_remainder_z"}};
+
+  static HiddenArg getHiddenArgFromOffset(unsigned Offset) {
+    for (unsigned I = 0; I < END_HIDDEN_ARGS; ++I)
+      if (HiddenArgs[I].Offset == Offset)
+        return static_cast<HiddenArg>(I);
+
+    return END_HIDDEN_ARGS;
+  }
+
+  static Type *getHiddenArgType(LLVMContext &Ctx, HiddenArg HA) {
+    if (HA < END_HIDDEN_ARGS)
+      return static_cast<Type *>(Type::getIntNTy(Ctx, HiddenArgs[HA].Size * 8));
+
+    llvm_unreachable("Unexpected hidden argument.");
+  }
+
+  static const char *getHiddenArgName(HiddenArg HA) {
+    if (HA < END_HIDDEN_ARGS) {
+      return HiddenArgs[HA].Name;
+    }
+    llvm_unreachable("Unexpected hidden argument.");
+  }
+};
+
+struct KernArgPreloadDescriptor {
+  // Id of the original argument in the IR kernel function argument list.
+  unsigned OrigArgIdx = 0;
+
+  // If this IR argument was split into multiple parts, this is the index of the
+  // part in the original argument.
+  unsigned PartIdx = 0;
+
+  // The registers that the argument is preloaded into. The argument may be
+  // split accross multilpe registers.
+  SmallVector<MCRegister, 2> Regs;
+};
+
+} // namespace KernArgPreload
+
 struct AMDGPUFunctionArgInfo {
   // clang-format off
   enum PreloadedValue {
@@ -161,7 +231,10 @@ struct AMDGPUFunctionArgInfo {
   ArgDescriptor WorkItemIDZ;
 
   // Map the index of preloaded kernel arguments to its descriptor.
-  SmallDenseMap<int, KernArgPreloadDescriptor> PreloadKernArgs{};
+  SmallDenseMap<int, KernArgPreload::KernArgPreloadDescriptor>
+      PreloadKernArgs{};
+  // Map hidden argument to the index of it's descriptor.
+  SmallDenseMap<KernArgPreload::HiddenArg, int> PreloadHiddenArgsIndexMap{};
   // The first user SGPR allocated for kernarg preloading.
   Register FirstKernArgPreloadReg;
 
@@ -169,6 +242,16 @@ struct AMDGPUFunctionArgInfo {
   getPreloadedValue(PreloadedValue Value) const;
 
   static AMDGPUFunctionArgInfo fixedABILayout();
+
+  // Returns preload argument descriptors for an IR argument index. Isel may
+  // split IR arguments into multiple parts, the return vector holds all parts
+  // associated with an IR argument in the kernel signature.
+  SmallVector<const KernArgPreload::KernArgPreloadDescriptor *, 4>
+  getPreloadDescriptorsForArgIdx(unsigned ArgIdx) const;
+
+  // Returns the hidden arguments `KernArgPreloadDescriptor` if it is preloaded.
+  std::optional<const KernArgPreload::KernArgPreloadDescriptor *>
+  getHiddenArgPreloadDescriptor(KernArgPreload::HiddenArg HA) const;
 };
 
 class AMDGPUArgumentUsageInfo : public ImmutablePass {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
index 2991778a1bbc7..f6f71b2d042d3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
@@ -15,6 +15,7 @@
 #include "AMDGPUHSAMetadataStreamer.h"
 #include "AMDGPU.h"
 #include "GCNSubtarget.h"
+#include "MCTargetDesc/AMDGPUInstPrinter.h"
 #include "MCTargetDesc/AMDGPUTargetStreamer.h"
 #include "SIMachineFunctionInfo.h"
 #include "SIProgramInfo.h"
@@ -290,7 +291,7 @@ void MetadataStreamerMsgPackV4::emitKernelArgs(const MachineFunction &MF,
     if (Arg.hasAttribute("amdgpu-hidden-argument"))
       continue;
 
-    emitKernelArg(Arg, Offset, Args);
+    emitKernelArg(Arg, Offset, Args, MF);
   }
 
   emitHiddenKernelArgs(MF, Offset, Args);
@@ -300,7 +301,8 @@ void MetadataStreamerMsgPackV4::emitKernelArgs(const MachineFunction &MF,
 
 void MetadataStreamerMsgPackV4::emitKernelArg(const Argument &Arg,
                                               unsigned &Offset,
-                                              msgpack::ArrayDocNode Args) {
+                                              msgpack::ArrayDocNode Args,
+                                              const MachineFunction &MF) {
   const auto *Func = Arg.getParent();
   auto ArgNo = Arg.getArgNo();
   const MDNode *Node;
@@ -357,17 +359,18 @@ void MetadataStreamerMsgPackV4::emitKernelArg(const Argument &Arg,
   Align ArgAlign;
   std::tie(ArgTy, ArgAlign) = getArgumentTypeAlign(Arg, DL);
 
-  emitKernelArg(DL, ArgTy, ArgAlign,
-                getValueKind(ArgTy, TypeQual, BaseTypeName), Offset, Args,
-                PointeeAlign, Name, TypeName, BaseTypeName, ActAccQual,
-                AccQual, TypeQual);
+  emitKernelArgImpl(DL, ArgTy, ArgAlign,
+                    getValueKind(ArgTy, TypeQual, BaseTypeName), Offset, Args,
+                    "" /* PreloadRegisters */, PointeeAlign, Name, TypeName,
+                    BaseTypeName, ActAccQual, AccQual, TypeQual);
 }
 
-void MetadataStreamerMsgPackV4::emitKernelArg(
+void MetadataStreamerMsgPackV4::emitKernelArgImpl(
     const DataLayout &DL, Type *Ty, Align Alignment, StringRef ValueKind,
-    unsigned &Offset, msgpack::ArrayDocNode Args, MaybeAlign PointeeAlign,
-    StringRef Name, StringRef TypeName, StringRef BaseTypeName,
-    StringRef ActAccQual, StringRef AccQual, StringRef TypeQual) {
+    unsigned &Offset, msgpack::ArrayDocNode Args, StringRef PreloadRegisters,
+    MaybeAlign PointeeAlign, StringRef Name, StringRef TypeName,
+    StringRef BaseTypeName, StringRef ActAccQual, StringRef AccQual,
+    StringRef TypeQual) {
   auto Arg = Args.getDocument()->getMapNode();
 
   if (!Name.empty())
@@ -409,6 +412,11 @@ void MetadataStreamerMsgPackV4::emitKernelArg(
       Arg[".is_pipe"] = Arg.getDocument()->getNode(true);
   }
 
+  if (!PreloadRegisters.empty()) {
+    Arg[".preload_registers"] =
+        Arg.getDocument()->getNode(PreloadRegisters, /*Copy=*/true);
+  }
+
   Args.push_back(Arg);
 }
 
@@ -428,14 +436,14 @@ void MetadataStreamerMsgPackV4::emitHiddenKernelArgs(
   Offset = alignTo(Offset, ST.getAlignmentForImplicitArgPtr());
 
   if (HiddenArgNumBytes >= 8)
-    emitKernelArg(DL, Int64Ty, Align(8), "hidden_global_offset_x", Offset,
-                  Args);
+    emitKernelArgImpl(DL, Int64Ty, Align(8), "hidden_global_offset_x", Offset,
+                      Args);
   if (HiddenArgNumBytes >= 16)
-    emitKernelArg(DL, Int64Ty, Align(8), "hidden_global_offset_y", Offset,
-                  Args);
+    emitKernelArgImpl(DL, Int64Ty, Align(8), "hidden_global_offset_y", Offset,
+                      Args);
   if (HiddenArgNumBytes >= 24)
-    emitKernelArg(DL, Int64Ty, Align(8), "hidden_global_offset_z", Offset,
-                  Args);
+    emitKernelArgImpl(DL, Int64Ty, Align(8), "hidden_global_offset_z", Offset,
+                      Args);
 
   auto *Int8PtrTy =
       PointerType::get(Func.getContext(), AMDGPUAS::GLOBAL_ADDRESS);
@@ -445,42 +453,42 @@ void MetadataStreamerMsgPackV4::emitHiddenKernelArgs(
     // before code object V5, which makes the mutual exclusion between the
     // "printf buffer" and "hostcall buffer" here sound.
     if (M->getNamedMetadata("llvm.printf.fmts"))
-      emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_printf_buffer", Offset,
-                    Args);
+      emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_printf_buffer", Offset,
+                        Args);
     else if (!Func.hasFnAttribute("amdgpu-no-hostcall-ptr"))
-      emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_hostcall_buffer", Offset,
-                    Args);
+      emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_hostcall_buffer",
+                        Offset, Args);
     else
-      emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_none", Offset, Args);
+      emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_none", Offset, Args);
   }
 
   // Emit "default queue" and "completion action" arguments if enqueue kernel is
   // used, otherwise emit dummy "none" arguments.
   if (HiddenArgNumBytes >= 40) {
     if (!Func.hasFnAttribute("amdgpu-no-default-queue")) {
-      emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_default_queue", Offset,
-                    Args);
+      emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_default_queue", Offset,
+                        Args);
     } else {
-      emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_none", Offset, Args);
+      emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_none", Offset, Args);
     }
   }
 
   if (HiddenArgNumBytes >= 48) {
     if (!Func.hasFnAttribute("amdgpu-no-completion-action")) {
-      emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_completion_action", Offset,
-                    Args);
+      emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_completion_action",
+                        Offset, Args);
     } else {
-      emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_none", Offset, Args);
+      emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_none", Offset, Args);
     }
   }
 
   // Emit the pointer argument for multi-grid object.
   if (HiddenArgNumBytes >= 56) {
     if (!Func.hasFnAttribute("amdgpu-no-multigrid-sync-arg")) {
-      emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_multigrid_sync_arg", Offset,
-                    Args);
+      emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_multigrid_sync_arg",
+                        Offset, Args);
     } else {
-      emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_none", Offset, Args);
+      emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_none", Offset, Args);
     }
   }
 }
@@ -635,77 +643,83 @@ void MetadataStreamerMsgPackV5::emitHiddenKernelArgs(
   auto *Int16Ty = Type::getInt16Ty(Func.getContext());
 
   Offset = alignTo(Offset, ST.getAlignmentForImplicitArgPtr());
-  emitKernelArg(DL, Int32Ty, Align(4), "hidden_block_count_x", Offset, Args);
-  emitKernelArg(DL, Int32Ty, Align(4), "hidden_block_count_y", Offset, Args);
-  emitKernelArg(DL, Int32Ty, Align(4), "hidden_block_count_z", Offset, Args);
+  emitKernelArgImpl(DL, Int32Ty, Align(4), "hidden_block_count_x", Offset,
+                    Args);
+  emitKernelArgImpl(DL, Int32Ty, Align(4), "hidden_block_count_y", Offset,
+                    Args);
+  emitKernelArgImpl(DL, Int32Ty, Align(4), "hidden_block_count_z", Offset,
+                    Args);
 
-  emitKernelArg(DL, Int16Ty, Align(2), "hidden_group_size_x", Offset, Args);
-  emitKernelArg(DL, Int16Ty, Align(2), "hidden_group_size_y", Offset, Args);
-  emitKernelArg(DL, Int16Ty, Align(2), "hidden_group_size_z", Offset, Args);
+  emitKernelArgImpl(DL, Int16Ty, Align(2), "hidden_group_size_x", Offset, Args);
+  emitKernelArgImpl(DL, Int16Ty, Align(2), "hidden_group_size_y", Offset, Args);
+  emitKernelArgImpl(DL, Int16Ty, Align(2), "hidden_group_size_z", Offset, Args);
 
-  emitKernelArg(DL, Int16Ty, Align(2), "hidden_remainder_x", Offset, Args);
-  emitKernelArg(DL, Int16Ty, Align(2), "hidden_remainder_y", Offset, Args);
-  emitKernelArg(DL, Int16Ty, Align(2), "hidden_remainder_z", Offset, Args);
+  emitKernelArgImpl(DL, Int16Ty, Align(2), "hidden_remainder_x", Offset, Args);
+  emitKernelArgImpl(DL, Int16Ty, Align(2), "hidden_remainder_y", Offset, Args);
+  emitKernelArgImpl(DL, Int16Ty, Align(2), "hidden_remainder_z", Offset, Args);
 
   // Reserved for hidden_tool_correlation_id.
   Offset += 8;
 
   Offset += 8; // Reserved.
 
-  emitKernelArg(DL, Int64Ty, Align(8), "hidden_global_offset_x", Offset, Args);
-  emitKernelArg(DL, Int64Ty, Align(8), "hidden_global_offset_y", Offset, Args);
-  emitKernelArg(DL, Int64Ty, Align(8), "hidden_global_offset_z", Offset, Args);
+  emitKernelArgImpl(DL, Int64Ty, Align(8), "hidden_global_offset_x", Offset,
+                    Args);
+  emitKernelArgImpl(DL, Int64Ty, Align(8), "hidden_global_offset_y", Offset,
+                    Args);
+  emitKernelArgImpl(DL, Int64Ty, Align(8), "hidden_global_offset_z", Offset,
+                    Args);
 
-  emitKernelArg(DL, Int16Ty, Align(2), "hidden_grid_dims", Offset, Args);
+  emitKernelArgImpl(DL, Int16Ty, Align(2), "hidden_grid_dims", Offset, Args);
 
   Offset += 6; // Reserved.
   auto *Int8PtrTy =
       PointerType::get(Func.getContext(), AMDGPUAS::GLOBAL_ADDRESS);
 
   if (M->getNamedMetadata("llvm.printf.fmts")) {
-    emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_printf_buffer", Offset,
-                  Args);
+    emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_printf_buffer", Offset,
+                      Args);
   } else {
     Offset += 8; // Skipped.
   }
 
   if (!Func.hasFnAttribute("amdgpu-no-hostcall-ptr")) {
-    emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_hostcall_buffer", Offset,
-                  Args);
+    emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_hostcall_buffer", Offset,
+                      Args);
   } else {
     Offset += 8; // Skipped.
   }
 
   if (!Func.hasFnAttribute("amdgpu-no-multigrid-sync-arg")) {
-    emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_multigrid_sync_arg", Offset,
-                Args);
+    emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_multigrid_sync_arg",
+                      Offset, Args);
   } else {
     Offset += 8; // Skipped.
   }
 
   if (!Func.hasFnAttribute("amdgpu-no-heap-ptr"))
-    emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_heap_v1", Offset, Args);
+    emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_heap_v1", Offset, Args);
   else
     Offset += 8; // Skipped.
 
   if (!Func.hasFnAttribute("amdgpu-no-default-queue")) {
-    emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_default_queue", Offset,
-                  Args);
+    emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_default_queue", Offset,
+                      Args);
   } else {
     Offset += 8; // Skipped.
   }
 
   if (!Func.hasFnAttribute("amdgpu-no-completion-action")) {
-    emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_completion_action", Offset,
-                  Args);
+    emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_completion_action",
+                      Offset, Args);
   } else {
     Offset += 8; // Skipped.
   }
 
   // Emit argument for hidden dynamic lds size
   if (MFI.isDynamicLDSUsed()) {
-    emitKernelArg(DL, Int32Ty, Align(4), "hidden_dynamic_lds_size", Offset,
-                  Args);
+    emitKernelArgImpl(DL, Int32Ty, Align(4), "hidden_dynamic_lds_size", Offset,
+                      Args);
   } else {
     Offset += 4; // skipped
   }
@@ -715,14 +729,17 @@ void MetadataStreamerMsgPackV5::emitHiddenKernelArgs(
   // hidden_private_base and hidden_shared_base are only when the subtarget has
   // ApertureRegs.
   if (!ST.hasApertureRegs()) {
-    emitKernelArg(DL, Int32Ty, Align(4), "hidden_private_base", Offset, Args);
-    emitKernelArg(DL, Int32Ty, Align(4), "hidden_shared_base", Offset, Args);
+    emitKernelArgImpl(DL, Int32Ty, Align(4), "hidden_private_base", Offset,
+                      Args);
+    emitKernelArgImpl(DL, Int32Ty, Align(4), "hidden_shared_base", Offset,
+                      Args);
   } else {
     Offset += 8; // Skipped.
   }
 
   if (MFI.getUserSGPRInfo().hasQueuePtr())
-    emitKernelArg(DL, Int8PtrTy, Align(8), "hidden_queue_ptr", Offset, Args);
+    emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_queue_ptr", Offset,
+                      Args);
 }
 
 void MetadataStreamerMsgPackV5::emitKernelAttrs(const AMDGPUTargetMachine &TM,
@@ -745,5 +762,241 @@ void MetadataStreamerMsgPackV6::emitVersion() {
   getRootMetadata("amdhsa.version") = Version;
 }
 
+void MetadataStreamerMsgPackV6::emitHiddenKernelArgWithPreload(
+    const DataLayout &DL, Type *ArgTy, Align Alignment,
+    KernArgPreload::HiddenArg HiddenArg, StringRef ArgName, unsigned &Offset,
+    msgpack::ArrayDocNode Args, const AMDGPUFunctionArgInfo &ArgInfo) {
+
+  SmallString<16> PreloadStr;
+  auto PreloadDesc = ArgInfo.getHiddenArgPreloadDescriptor(HiddenArg);
+  if (PreloadDesc) {
+    const auto &Regs = (*PreloadDesc)->Regs;
+    for (unsigned I = 0; I < Regs.size(); ++I) {
+      if (I > 0)
+        PreloadStr += " ";
+      PreloadStr += AMDGPUInstPrinter::getRegisterName(Regs[I]);
+    }
+  }
+  emitKernelArgImpl(DL, ArgTy, Alignment, ArgName, Offset, Args, PreloadStr);
+}
+
+void MetadataStreamerMsgPackV6::emitHiddenKernelArgs(
+    const MachineFunction &MF, unsigne...
[truncated]

Pierre-vh · 2025-04-10T08:11:31Z

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp

+      Results.push_back(&KV.second);
+  }
+
+  llvm::stable_sort(Results, [](const KernArgPreloadDescriptor *A,


llvm:: prefix is not necessary

Pierre-vh · 2025-04-10T08:12:04Z

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h

+    for (unsigned I = 0; I < END_HIDDEN_ARGS; ++I)
+      if (HiddenArgs[I].Offset == Offset)
+        return static_cast<HiddenArg>(I);


Suggested change

for (unsigned I = 0; I < END_HIDDEN_ARGS; ++I)

if (HiddenArgs[I].Offset == Offset)

return static_cast<HiddenArg>(I);

for (unsigned I = 0; I < END_HIDDEN_ARGS; ++I) {

if (HiddenArgs[I].Offset == Offset)

return static_cast<HiddenArg>(I);

}

Pierre-vh · 2025-04-10T08:13:40Z

llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp

+  // There's no distinction between byval aggregates and raw aggregates.
+  Type *ArgTy;
+  Align ArgAlign;
+  std::tie(ArgTy, ArgAlign) = getArgumentTypeAlign(Arg, DL);


Suggested change

std::tie(ArgTy, ArgAlign) = getArgumentTypeAlign(Arg, DL);

auto [ArgTy, ArgAlign] = getArgumentTypeAlign(Arg, DL);

Pierre-vh · 2025-04-10T08:15:48Z

llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp

+  if (M->getNamedMetadata("llvm.printf.fmts")) {
+    emitKernelArgImpl(DL, Int8PtrTy, Align(8), "hidden_printf_buffer", Offset,
+                      Args);
+  } else {


small nit: don't use {} for the elses in this function, they're all one line ?

I skipped this suggestion since this is the style elsewhere in the function. Also I though if the if has braces the matching else should too even if it's only one line?

Pierre-vh · 2025-04-10T08:17:14Z

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h

+  }
+
+  static const char *getHiddenArgName(HiddenArg HA) {
+    if (HA < END_HIDDEN_ARGS) {


small nit: don't use {} here

arsenm · 2025-04-13T10:03:37Z

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h

+
+  static Type *getHiddenArgType(LLVMContext &Ctx, HiddenArg HA) {
+    if (HA < END_HIDDEN_ARGS)
+      return static_cast<Type *>(Type::getIntNTy(Ctx, HiddenArgs[HA].Size * 8));


Don't need the cast

arsenm · 2025-04-13T10:04:15Z

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h

+    if (HA < END_HIDDEN_ARGS) {
+      return HiddenArgs[HA].Name;
+    }
+    llvm_unreachable("Unexpected hidden argument.");


Suggested change

llvm_unreachable("Unexpected hidden argument.");

llvm_unreachable("unexpected hidden argument");

arsenm · 2025-04-13T10:06:39Z

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h

+  getPreloadDescriptorsForArgIdx(unsigned ArgIdx) const;
+
+  // Returns the hidden arguments `KernArgPreloadDescriptor` if it is preloaded.
+  std::optional<const KernArgPreload::KernArgPreloadDescriptor *>


Avoid optional of pointer

llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp

arsenm · 2025-04-13T10:09:28Z

llvm/test/CodeGen/AMDGPU/hsa-metadata-preload-args-v6.ll

+!2 = !{!"2:1:8:%g\5Cn"}
+
+attributes #0 = { optnone noinline }
+attributes #1 = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }


Missing end of file newline

arsenm · 2025-04-13T10:10:15Z

llvm/test/CodeGen/AMDGPU/hsa-metadata-preload-args-v6.ll

+  store i32 %add, ptr addrspace(1) %out, align 4
+  ret void
+}
+


Test a preloaded vector? Is inreg supposed on other aggregates?

No preloading on other aggregates it's just ignored long before emitting metadata.

llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp

kerbowa · 2025-05-30T17:06:34Z

ping

shiltian

just drive by with some nits

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h

kerbowa · 2025-06-18T05:19:52Z

Any more comments/concerns?

shiltian

Looks good to me with some style nits

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h

llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h

arsenm · 2025-06-24T13:13:19Z

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h

@@ -161,14 +232,27 @@ struct AMDGPUFunctionArgInfo {
  ArgDescriptor WorkItemIDZ;

  // Map the index of preloaded kernel arguments to its descriptor.
-  SmallDenseMap<int, KernArgPreloadDescriptor> PreloadKernArgs{};
+  SmallDenseMap<int, KernArgPreload::KernArgPreloadDescriptor>
+      PreloadKernArgs{};


Suggested change

PreloadKernArgs{};

PreloadKernArgs;

The constructor for SmallDenseMap is explict so I need to directly initialize it.

llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp

arsenm · 2025-06-24T13:18:30Z

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp

+      Results.push_back(&KV.second);
+  }
+
+  stable_sort(Results, [](const KernArgPreloadDescriptor *A,


The map is just over an integer index, can you use IndexedMap instead and avoid the sort?

Are you saying this function should return an indexed map? I don't think I can iterate over it efficiently.

If what you mean is that PreloadKernArgs should be indexed by the OrigArgIdx I considered this initially but it would require refactoring the isel code and I didn't think it would be worth much performance since the number of parts per arg is usually very low. I can do the refactor if you think it's worth it though, let me know.

Tracks the registers that explicit and hidden arguments are preloaded to with new code object metadata. IR arguments may be split across multiple parts by isel, and SGPR tuple alignment means that an argument may be spread across multiple registers. To support this, some of the utilities for hidden kernel arguments are moved to `AMDGPUArgumentUsageInfo.h`. Additional bookkeeping is also needed for tracking purposes.

…elArg. Update test.

github-actions · 2025-06-29T06:30:59Z

✅ With the latest revision this PR passed the C/C++ code formatter.

kerbowa marked this pull request as ready for review April 7, 2025 15:07

llvmbot added backend:AMDGPU llvm:support labels Apr 7, 2025

kerbowa requested review from arsenm, kzhuravl, Pierre-vh and yxsamliu April 7, 2025 15:14

Pierre-vh reviewed Apr 10, 2025

View reviewed changes

arsenm reviewed Apr 13, 2025

View reviewed changes

kerbowa force-pushed the users/kerbowa/preload-kernarg-metadata branch from 62e5168 to f238c3f Compare April 27, 2025 18:19

arsenm reviewed May 5, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp Outdated Show resolved Hide resolved

kerbowa force-pushed the users/kerbowa/preload-kernarg-metadata branch from f238c3f to 452d27a Compare May 9, 2025 19:53

kerbowa force-pushed the users/kerbowa/preload-kernarg-metadata branch from e130ee1 to 82c8a97 Compare May 29, 2025 03:56

shiltian reviewed May 30, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h Outdated Show resolved Hide resolved

kerbowa force-pushed the users/kerbowa/preload-kernarg-metadata branch from 82c8a97 to 4f197a1 Compare June 2, 2025 05:35

kerbowa force-pushed the users/kerbowa/preload-kernarg-metadata branch from 4f197a1 to a2e299e Compare June 18, 2025 05:18

kerbowa requested review from Pierre-vh, shiltian and arsenm June 23, 2025 15:10

shiltian reviewed Jun 23, 2025

View reviewed changes

arsenm reviewed Jun 24, 2025

View reviewed changes

kerbowa added 4 commits June 28, 2025 23:16

Add suggested formatting changes, factor out common parts of emitKenr…

209bf8b

…elArg. Update test.

Factor common emit hidden kernel args metadata.

f70a0bd

Rebase on changes to move preloading lowering to its own pass.

b79ca55

kerbowa force-pushed the users/kerbowa/preload-kernarg-metadata branch from a2e299e to 9383014 Compare June 29, 2025 06:28

Rebase and address review comments.

36f2865

kerbowa force-pushed the users/kerbowa/preload-kernarg-metadata branch from 9383014 to 36f2865 Compare June 29, 2025 07:09

	std::tie(ArgTy, ArgAlign) = getArgumentTypeAlign(Arg, DL);
	auto [ArgTy, ArgAlign] = getArgumentTypeAlign(Arg, DL);

	llvm_unreachable("Unexpected hidden argument.");
	llvm_unreachable("unexpected hidden argument");

[AMDGPU] Update code object metadata for kernarg preload #134666

Are you sure you want to change the base?

[AMDGPU] Update code object metadata for kernarg preload #134666

Uh oh!

Conversation

kerbowa commented Apr 7, 2025

Uh oh!

kerbowa commented Apr 7, 2025

Uh oh!

llvmbot commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kerbowa commented May 30, 2025

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kerbowa commented Jun 18, 2025

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Apr 7, 2025 •

edited

Loading

github-actions bot commented Jun 29, 2025 •

edited

Loading