Skip to content

AMDGPU: Replace amdgpu-no-agpr with amdgpu-agpr-alloc #129893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Mar 5, 2025

This performs the minimal replacment of amdgpu-no-agpr to
amdgpu-agpr-alloc=0. Most of the test diffs are due to the new
attribute sorting later alphabetically.

We could do better by trying to perform range merging in the attributor,
and trying to pick non-0 values.

Copy link
Contributor Author

arsenm commented Mar 5, 2025

@arsenm arsenm marked this pull request as ready for review March 5, 2025 15:46
@llvmbot
Copy link
Member

llvmbot commented Mar 5, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

This performs the minimal replacment of amdgpu-no-agpr to
amdgpu-num-agpr=0. Most of the test diffs are due to the new
attribute sorting later alphabetically.

We could do better by trying to perform range merging in the attributor,
and trying to pick non-0 values.


Patch is 168.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/129893.diff

45 Files Affected:

  • (modified) llvm/docs/AMDGPUUsage.rst (+1-6)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+7-2)
  • (modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+4-1)
  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+1-7)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers-assertion-after-ra-failure.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll (+7-6)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+21-21)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+13-13)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/captured-frame-index.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/copy-vgpr-clobber-spill-vgpr.mir (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/duplicate-attribute-indirect.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/implicitarg-offset-attributes.ll (+13-13)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-call-set-from-other-function.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/invalid-hidden-kernarg-in-kernel-signature.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/issue120256-annotate-constexpr-addrspacecast.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/mfma-bf16-vgpr-cd-select.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/mfma-cd-select.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx942.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/preload-implicit-kernargs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/preload-kernargs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/propagate-flat-work-group-size.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/propagate-waves-per-eu.ll (+21-21)
  • (modified) llvm/test/CodeGen/AMDGPU/recursive_global_initializer.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/remove-no-kernel-id-attribute.ll (+5-5)
  • (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call-2.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/smfmac_no_agprs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/spill-regpressure-less.mir (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-attribute-missing.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-multistep.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-prevent-attribute-propagation.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-recursion-test.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-test.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr-agpr-limit-gfx90a.ll (+6-6)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index c317223f49d7c..def6addd595e8 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1698,11 +1698,6 @@ The AMDGPU backend supports the following LLVM IR attributes.
                                                       ``amdgpu_max_num_work_groups`` CLANG attribute [CLANG-ATTR]_. Clang only
                                                       emits this attribute when all the three numbers are >= 1.
 
-     "amdgpu-no-agpr"                                 Indicates the function will not require allocating AGPRs. This is only
-                                                      relevant on subtargets with AGPRs. The behavior is undefined if a
-                                                      function which requires AGPRs is reached through any function marked
-                                                      with this attribute.
-
      "amdgpu-hidden-argument"                         This attribute is used internally by the backend to mark function arguments
                                                       as hidden. Hidden arguments are managed by the compiler and are not part of
                                                       the explicit arguments supplied by the user.
@@ -1721,7 +1716,7 @@ The AMDGPU backend supports the following LLVM IR attributes.
                                                       The behavior is undefined if a function which requires more AGPRs than the
                                                       lower bound is reached through any function marked with a higher value of this
                                                       attribute. A minimum value of 0 indicates the function does not require
-                                                      any AGPRs. A minimum of 0 is equivalent to "amdgpu-no-agpr".
+                                                      any AGPRs.
 
                                                       This is only relevant on targets with AGPRs which support accum_offset (gfx90a+).
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 546db318c17d5..cfff66fa07f98 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1235,6 +1235,8 @@ static bool inlineAsmUsesAGPRs(const InlineAsm *IA) {
   return false;
 }
 
+// TODO: Migrate to range merge of amdgpu-agpr-alloc.
+// FIXME: Why is this using Attribute::NoUnwind?
 struct AAAMDGPUNoAGPR
     : public IRAttribute<Attribute::NoUnwind,
                          StateWrapper<BooleanState, AbstractAttribute>,
@@ -1250,7 +1252,10 @@ struct AAAMDGPUNoAGPR
 
   void initialize(Attributor &A) override {
     Function *F = getAssociatedFunction();
-    if (F->hasFnAttribute("amdgpu-no-agpr"))
+    auto [MinNumAGPR, MaxNumAGPR] =
+        AMDGPU::getIntegerPairAttribute(*F, "amdgpu-agpr-alloc", {~0u, ~0u},
+                                        /*OnlyFirstRequired=*/true);
+    if (MinNumAGPR == 0)
       indicateOptimisticFixpoint();
   }
 
@@ -1297,7 +1302,7 @@ struct AAAMDGPUNoAGPR
       return ChangeStatus::UNCHANGED;
     LLVMContext &Ctx = getAssociatedFunction()->getContext();
     return A.manifestAttrs(getIRPosition(),
-                           {Attribute::get(Ctx, "amdgpu-no-agpr")});
+                           {Attribute::get(Ctx, "amdgpu-agpr-alloc", "0")});
   }
 
   const std::string getName() const override { return "AAAMDGPUNoAGPR"; }
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
index a83fc2d188de2..abd19c988a7eb 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
@@ -780,5 +780,8 @@ bool SIMachineFunctionInfo::initializeBaseYamlFields(
 }
 
 bool SIMachineFunctionInfo::mayUseAGPRs(const Function &F) const {
-  return !F.hasFnAttribute("amdgpu-no-agpr");
+  auto [MinNumAGPR, MaxNumAGPR] =
+      AMDGPU::getIntegerPairAttribute(F, "amdgpu-agpr-alloc", {~0u, ~0u},
+                                      /*OnlyFirstRequired=*/true);
+  return MinNumAGPR != 0u;
 }
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 669495f1c3185..adadf8e4e4e65 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -571,7 +571,6 @@ MCRegister SIRegisterInfo::reservedPrivateSegmentBufferReg(
 
 std::pair<unsigned, unsigned>
 SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
-  const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
   const unsigned MaxVectorRegs = ST.getMaxNumVGPRs(MF);
 
   unsigned MaxNumVGPRs = MaxVectorRegs;
@@ -592,7 +591,6 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
 
     const std::pair<unsigned, unsigned> DefaultNumAGPR = {~0u, ~0u};
 
-    // TODO: Replace amdgpu-no-agpr with amdgpu-agpr-alloc=0
     // TODO: Move this logic into subtarget on IR function
     //
     // TODO: The lower bound should probably force the number of required
@@ -603,11 +601,7 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
 
     if (MinNumAGPRs == DefaultNumAGPR.first) {
       // Default to splitting half the registers if AGPRs are required.
-
-      if (MFI->mayNeedAGPRs())
-        MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
-      else
-        MinNumAGPRs = 0;
+      MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
     } else {
       // Align to accum_offset's allocation granularity.
       MinNumAGPRs = alignTo(MinNumAGPRs, 4);
diff --git a/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll b/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
index d316e10037757..0f5028fd82296 100644
--- a/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
@@ -233,8 +233,8 @@ attributes #1 = { nounwind }
 ; AKF_HSA: attributes #[[ATTR1]] = { nounwind }
 ;.
 ; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
-; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
 ;.
 ; AKF_HSA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers-assertion-after-ra-failure.ll b/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers-assertion-after-ra-failure.ll
index f3eb7a42cb823..cea1fe49f4d8b 100644
--- a/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers-assertion-after-ra-failure.ll
+++ b/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers-assertion-after-ra-failure.ll
@@ -17,6 +17,6 @@ define void @no_free_vgprs_at_agpr_to_agpr_copy(float %v0, float %v1) #0 {
 declare <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float, float, <16 x float>, i32 immarg, i32 immarg, i32 immarg) #1
 declare noundef i32 @llvm.amdgcn.workitem.id.x() #2
 
-attributes #0 = { "amdgpu-no-agpr" "amdgpu-waves-per-eu"="6,6" }
+attributes #0 = { "amdgpu-agpr-alloc"="0" "amdgpu-waves-per-eu"="6,6" }
 attributes #1 = { convergent nocallback nofree nosync nounwind willreturn memory(none) }
 attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
diff --git a/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll b/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
index d1b01eeee11a4..e70e34fa0ba5d 100644
--- a/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
+++ b/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
@@ -1144,6 +1144,6 @@ declare i32 @llvm.amdgcn.workitem.id.x() #2
 attributes #0 = { "amdgpu-waves-per-eu"="6,6" }
 attributes #1 = { convergent nounwind readnone willreturn }
 attributes #2 = { nounwind readnone willreturn }
-attributes #3 = { "amdgpu-waves-per-eu"="7,7" "amdgpu-no-agpr" }
+attributes #3 = { "amdgpu-waves-per-eu"="7,7" "amdgpu-agpr-alloc"="0" }
 attributes #4 = { "amdgpu-waves-per-eu"="6,6" "amdgpu-flat-work-group-size"="1024,1024" }
-attributes #5 = { "amdgpu-waves-per-eu"="6,6" "amdgpu-no-agpr" }
+attributes #5 = { "amdgpu-waves-per-eu"="6,6" "amdgpu-agpr-alloc"="0" }
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
index 33e7e7a7a019e..7e9cb7adf4fc2 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
@@ -252,13 +252,13 @@ define amdgpu_kernel void @indirect_calls_none_agpr(i1 %cond) {
 }
 
 
-attributes #0 = { "amdgpu-no-agpr" }
+attributes #0 = { "amdgpu-agpr-alloc"="0" }
 ;.
 ; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR1]] = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
 ; CHECK: attributes #[[ATTR2]] = { "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
 ; CHECK: attributes #[[ATTR3:[0-9]+]] = { convergent nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
 ; CHECK: attributes #[[ATTR4:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
 ; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-agpr" }
+; CHECK: attributes #[[ATTR6]] = { "amdgpu-agpr-alloc"="0" }
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll
index d0bf8d3920a98..7bf9a29e9ff44 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll
@@ -1,6 +1,6 @@
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 < %s | FileCheck -check-prefixes=CHECK,GFX908 %s
 ; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s 2> %t.err | FileCheck -check-prefixes=CHECK,GFX90A %s
-; RUN: FileCheck -check-prefix=ERR < %t.err %s
+; RUN: FileCheck --implicit-check-not=error -check-prefix=ERR < %t.err %s
 
 ; Test undefined behavior where a function ends up needing AGPRs that
 ; was marked with "amdgpu-agpr-alloc="="0". There should be no asserts.
@@ -9,7 +9,6 @@
 
 ; ERR: error: <unknown>:0:0: no registers from class available to allocate in function 'kernel_illegal_agpr_use_asm'
 ; ERR: error: <unknown>:0:0: no registers from class available to allocate in function 'func_illegal_agpr_use_asm'
-; ERR: error: <unknown>:0:0: no registers from class available to allocate in function 'kernel_calls_mfma.f32.32x32x1f32'
 
 ; CHECK: {{^}}kernel_illegal_agpr_use_asm:
 ; CHECK: ; use a0
@@ -32,14 +31,16 @@ define void @func_illegal_agpr_use_asm() #0 {
 }
 
 ; CHECK-LABEL: {{^}}kernel_calls_mfma.f32.32x32x1f32:
-; CHECK: v_accvgpr_write_b32
+; GFX908: v_accvgpr_write_b32
+; GFX90A-NOT: v_accvgpr_write_b32
 
 ; GFX908: NumVgprs: 5
-; GFX90A: NumVgprs: 36
-; CHECK: NumAgprs: 32
+; GFX908: NumAgprs: 32
+; GFX90A: NumVgprs: 35
+; GFX90A: NumAgprs: 0
 
 ; GFX908: TotalNumVgprs: 32
-; GFX90A: TotalNumVgprs: 68
+; GFX90A: TotalNumVgprs: 35
 define amdgpu_kernel void @kernel_calls_mfma.f32.32x32x1f32(ptr addrspace(1) %out, float %a, float %b, <32 x float> %c) #0 {
   %result = call <32 x float> @llvm.amdgcn.mfma.f32.32x32x1f32(float %a, float %b, <32 x float> %c, i32 0, i32 0, i32 0)
   store <32 x float> %result, ptr addrspace(1) %out
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll
index 15a442f85ebca..1f6ffe076822c 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll
@@ -15,7 +15,7 @@ define amdgpu_kernel void @min_num_agpr_0_0__amdgpu_no_agpr() #0 {
   ret void
 }
 
-attributes #0 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0,0" "amdgpu-no-agpr" }
+attributes #0 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0,0" }
 
 ; Check parse of single entry 0
 
@@ -26,16 +26,16 @@ define amdgpu_kernel void @min_num_agpr_0__amdgpu_no_agpr() #1 {
   ret void
 }
 
-attributes #1 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0" "amdgpu-no-agpr" }
+attributes #1 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0" }
 
 
 ; Undefined use
-define amdgpu_kernel void @min_num_agpr_1_1__amdgpu_no_agpr() #2 {
+define amdgpu_kernel void @min_num_agpr_1_1() #2 {
   call void asm sideeffect "; clobber $0","~{a0}"(), !srcloc !{i32 3}
   ret void
 }
 
-attributes #2 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="1,1" "amdgpu-no-agpr" }
+attributes #2 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="1,1" }
 
 ; Check parse of single entry 4, interpreted as the minimum. Total budget is 64.
 ; WARN: warning: <unknown>:0:0: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in 'min_num_agpr_4__amdgpu_no_agpr': desired occupancy was 8, final occupancy is 7
@@ -48,7 +48,7 @@ define amdgpu_kernel void @min_num_agpr_4__amdgpu_no_agpr() #3 {
   ret void
 }
 
-attributes #3 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="4" "amdgpu-no-agpr" }
+attributes #3 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="4" }
 
 
 ; Allocation granularity requires rounding this to use 4 AGPRs, so the
@@ -79,7 +79,7 @@ define amdgpu_kernel void @min_num_agpr_64_64__amdgpu_no_agpr() #5 {
   ret void
 }
 
-attributes #5 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="64,64" "amdgpu-no-agpr" }
+attributes #5 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="64,64" }
 
 ; No free VGPRs
 ; WARN: warning: inline asm clobber list contains reserved registers: v0 at line 7
diff --git a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
index 0114de738ce84..dd760c2a215ca 100644
--- a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
@@ -70,4 +70,4 @@ define amdgpu_kernel void @amdhsa_kernarg_preload_1_implicit_2(i32 inreg) #0 { r
 
 define amdgpu_kernel void @amdhsa_kernarg_preload_0_implicit_2(i32) #0 { ret void }
 
-attributes #0 = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+attributes #0 = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
diff --git a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
index ea3f08ede2c5d..f7bf0c4448c0f 100644
--- a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
+++ b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
@@ -1025,31 +1025,31 @@ attributes #6 = { "enqueued-block" }
 ; AKF_HSA: attributes #[[ATTR8]] = { "amdgpu-calls" }
 ;.
 ; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
-; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitar...
[truncated]

if (MFI->mayNeedAGPRs())
MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
else
MinNumAGPRs = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the removal of the forced minima yields no functional change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mayNeedAGPRs is a wrapper around the attribute now, this is just redundant now

@arsenm arsenm changed the title AMDGPU: Replace amdgpu-no-agpr with amdgpu-num-agpr AMDGPU: Replace amdgpu-no-agpr with amdgpu-agpr-alloc Mar 6, 2025
Copy link
Contributor Author

arsenm commented Mar 6, 2025

Merge activity

  • Mar 5, 9:11 PM EST: A user started a stack merge that includes this pull request via Graphite.
  • Mar 5, 9:14 PM EST: Graphite rebased this pull request as part of a merge.
  • Mar 5, 9:17 PM EST: A user merged this pull request with Graphite.

@arsenm arsenm force-pushed the users/arsenm/amdgpu/add-amdgpu-num-agpr-attribute branch from f2c130c to 9d313bb Compare March 6, 2025 02:12
Base automatically changed from users/arsenm/amdgpu/add-amdgpu-num-agpr-attribute to main March 6, 2025 02:14
arsenm added 2 commits March 6, 2025 02:14
This performs the minimal replacment of amdgpu-no-agpr to
amdgpu-num-agpr=0. Most of the test diffs are due to the new
attribute sorting later alphabetically.

We could do better by trying to perform range merging in the attributor,
and trying to pick non-0 values.
@arsenm arsenm force-pushed the users/arsenm/amdgpu/replace-amdgpu-no-agpr-with-amdgpu-num-agpr branch from 10e9948 to 1478dd1 Compare March 6, 2025 02:14
@arsenm arsenm merged commit a216358 into main Mar 6, 2025
7 of 11 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/replace-amdgpu-no-agpr-with-amdgpu-num-agpr branch March 6, 2025 02:17
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
This performs the minimal replacment of amdgpu-no-agpr to
amdgpu-agpr-alloc=0. Most of the test diffs are due to the new
attribute sorting later alphabetically.

We could do better by trying to perform range merging in the attributor,
and trying to pick non-0 values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants