Skip to content

[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass #102913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Sep 30, 2024

Conversation

JanekvO
Copy link
Contributor

@JanekvO JanekvO commented Aug 12, 2024

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.

  • Function resource info (finalized) validation added through ValidateMCResourceInfo. Previously (and currently, still) done in getSIProgramInfo. However, MCExprs may not be resolvable during getSIProgramInfo due to propagation of yet-to-be-defined function resource info symbols. Does require some caching of the generated occupancy MCExpr computation in getSIProgramInfo to validate against.
  • Unfortunately, requires module specific max_num_Xgprs symbols to be emitted. Cannot generate a separate MCExpr for finding the max without getting recursive symbol defines.
  • Most test changes are caused by the representation changes from constant/known values to MCExprs.

Fixes #64863

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU mc Machine (object) code llvm:globalisel labels Aug 12, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 12, 2024

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang

Author: Janek van Oirschot (JanekvO)

Changes

!!! Stacked PR on top of #95951 commit, please only review the latest commit 51f72f115b340a092c2c9f8569911b944a4efb6d !!!!

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.

  • Function resource info (finalized) validation added through ValidateMCResourceInfo. Previously (and currently, still) done in getSIProgramInfo. However, MCExprs may not be resolvable during getSIProgramInfo due to propagation of yet-to-be-defined function resource info symbols. Does require some caching of the generated occupancy MCExpr computation in getSIProgramInfo to validate against.
  • Unfortunately, requires module specific max_num_Xgprs symbols to be emitted. Cannot generate a separate MCExpr for finding the max without getting recursive symbol defines.
  • Most test changes are caused by the representation changes from constant/known values to MCExprs.

Patch is 369.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102913.diff

66 Files Affected:

  • (modified) clang/test/Frontend/amdgcn-machine-analysis-remarks.cl (+5-5)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+176-42)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h (+14-5)
  • (added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp (+220)
  • (added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h (+94)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp (+22-125)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.h (+12-28)
  • (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.cpp (+359)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.h (+11)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+55-14)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h (+23)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUPALMetadata.cpp (+5-5)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp (+20-26)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.h (+4-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll (+84-84)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll (+9-7)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.workitem.id.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/agpr-register-count.ll (+37-28)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/amdpal-metadata-agpr-register-count.ll (+3-1)
  • (modified) llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll (+28-23)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+26-4)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+10-6)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+6-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+9-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+9-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+9-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll (+39-11)
  • (modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/cndmask-no-def-vcc.ll (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/code-object-v3.ll (+24-10)
  • (modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll (+4-2)
  • (modified) llvm/test/CodeGen/AMDGPU/elf.ll (+4-3)
  • (modified) llvm/test/CodeGen/AMDGPU/enable-scratch-only-dynamic-stack.ll (+9-5)
  • (added) llvm/test/CodeGen/AMDGPU/function-resource-usage.ll (+533)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/inline-asm-reserved-regs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/ipra.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/kernarg-size.ll (+3-1)
  • (modified) llvm/test/CodeGen/AMDGPU/kernel_code_t_recurse.ll (+6-1)
  • (modified) llvm/test/CodeGen/AMDGPU/large-alloca-compute.ll (+23-8)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+15-30)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll (+13-13)
  • (modified) llvm/test/CodeGen/AMDGPU/mesa3d.ll (+8-5)
  • (modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+49-50)
  • (modified) llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll (+12-8)
  • (modified) llvm/test/CodeGen/AMDGPU/recursion.ll (+18-10)
  • (modified) llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll (+32-32)
  • (modified) llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll (+7-6)
  • (modified) llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll (+90-36)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/trap.ll (+21-12)
  • (modified) llvm/test/MC/AMDGPU/amd_kernel_code_t.s (+14-22)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s (+32-32)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s (+30-30)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s (+29-29)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s (+27-27)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s (+27-27)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s (+23-23)
diff --git a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
index a05e21b37b9127..a2dd59a871904c 100644
--- a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
+++ b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
@@ -2,12 +2,12 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx908 -Rpass-analysis=kernel-resource-usage -S -O0 -verify %s -o /dev/null
 
 // expected-remark@+10 {{Function Name: foo}}
-// expected-remark@+9 {{    SGPRs: 13}}
-// expected-remark@+8 {{    VGPRs: 10}}
-// expected-remark@+7 {{    AGPRs: 12}}
-// expected-remark@+6 {{    ScratchSize [bytes/lane]: 0}}
+// expected-remark@+9 {{    SGPRs: foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1))}}
+// expected-remark@+8 {{    VGPRs: foo.num_vgpr}}
+// expected-remark@+7 {{    AGPRs: foo.num_agpr}}
+// expected-remark@+6 {{    ScratchSize [bytes/lane]: foo.private_seg_size}}
 // expected-remark@+5 {{    Dynamic Stack: False}}
-// expected-remark@+4 {{    Occupancy [waves/SIMD]: 10}}
+// expected-remark@+4 {{    Occupancy [waves/SIMD]: occupancy(10, 4, 256, 8, 10, max(foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1)), 1, 0), max(totalnumvgprs(foo.num_agpr, foo.num_vgpr), 1, 0))}}
 // expected-remark@+3 {{    SGPRs Spill: 0}}
 // expected-remark@+2 {{    VGPRs Spill: 0}}
 // expected-remark@+1 {{    LDS Size [bytes/block]: 0}}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index e64e28e01d3d18..97a5cb29d51023 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -18,6 +18,7 @@
 #include "AMDGPUAsmPrinter.h"
 #include "AMDGPU.h"
 #include "AMDGPUHSAMetadataStreamer.h"
+#include "AMDGPUMCResourceInfo.h"
 #include "AMDGPUResourceUsageAnalysis.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUInstPrinter.h"
@@ -92,6 +93,9 @@ AMDGPUAsmPrinter::AMDGPUAsmPrinter(TargetMachine &TM,
                                    std::unique_ptr<MCStreamer> Streamer)
     : AsmPrinter(TM, std::move(Streamer)) {
   assert(OutStreamer && "AsmPrinter constructed without streamer");
+  RI = std::make_unique<MCResourceInfo>(OutContext);
+  OccupancyValidateMap =
+      std::make_unique<DenseMap<const Function *, const MCExpr *>>();
 }
 
 StringRef AMDGPUAsmPrinter::getPassName() const {
@@ -359,6 +363,102 @@ bool AMDGPUAsmPrinter::doInitialization(Module &M) {
   return AsmPrinter::doInitialization(M);
 }
 
+void AMDGPUAsmPrinter::ValidateMCResourceInfo(Function &F) {
+  if (F.isDeclaration() || !AMDGPU::isModuleEntryFunctionCC(F.getCallingConv()))
+    return;
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  const GCNSubtarget &STM = TM.getSubtarget<GCNSubtarget>(F);
+
+  auto TryGetMCExprValue = [](const MCExpr *Value, uint64_t &Res) -> bool {
+    int64_t Val;
+    if (Value->evaluateAsAbsolute(Val)) {
+      Res = Val;
+      return true;
+    }
+    return false;
+  };
+
+  const uint64_t MaxScratchPerWorkitem =
+      STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
+  MCSymbol *ScratchSizeSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_PrivateSegSize);
+  uint64_t ScratchSize;
+  if (ScratchSizeSymbol->isVariable() &&
+      TryGetMCExprValue(ScratchSizeSymbol->getVariableValue(), ScratchSize) &&
+      ScratchSize > MaxScratchPerWorkitem) {
+    DiagnosticInfoStackSize DiagStackSize(F, ScratchSize, MaxScratchPerWorkitem,
+                                          DS_Error);
+    F.getContext().diagnose(DiagStackSize);
+  }
+
+  // Validate addressable scalar registers (i.e., prior to added implicit
+  // SGPRs).
+  MCSymbol *NumSGPRSymbol = RI->getSymbol(F.getName(), RIK::RIK_NumSGPR);
+  if (STM.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS &&
+      !STM.hasSGPRInitBug()) {
+    unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+    uint64_t NumSgpr;
+    if (NumSGPRSymbol->isVariable() &&
+        TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+        NumSgpr > MaxAddressableNumSGPRs) {
+      DiagnosticInfoResourceLimit Diag(F, "addressable scalar registers",
+                                       NumSgpr, MaxAddressableNumSGPRs,
+                                       DS_Error, DK_ResourceLimit);
+      F.getContext().diagnose(Diag);
+      return;
+    }
+  }
+
+  MCSymbol *VCCUsedSymbol = RI->getSymbol(F.getName(), RIK::RIK_UsesVCC);
+  MCSymbol *FlatUsedSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_UsesFlatScratch);
+  uint64_t VCCUsed, FlatUsed, NumSgpr;
+
+  if (NumSGPRSymbol->isVariable() && VCCUsedSymbol->isVariable() &&
+      FlatUsedSymbol->isVariable() &&
+      TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+      TryGetMCExprValue(VCCUsedSymbol->getVariableValue(), VCCUsed) &&
+      TryGetMCExprValue(FlatUsedSymbol->getVariableValue(), FlatUsed)) {
+
+    // Recomputes NumSgprs + implicit SGPRs but all symbols should now be
+    // resolvable.
+    NumSgpr += IsaInfo::getNumExtraSGPRs(
+        &STM, VCCUsed, FlatUsed,
+        getTargetStreamer()->getTargetID()->isXnackOnOrAny());
+    if (STM.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ||
+        STM.hasSGPRInitBug()) {
+      unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+      if (NumSgpr > MaxAddressableNumSGPRs) {
+        DiagnosticInfoResourceLimit Diag(F, "scalar registers", NumSgpr,
+                                         MaxAddressableNumSGPRs, DS_Error,
+                                         DK_ResourceLimit);
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+
+    auto I = OccupancyValidateMap->find(&F);
+    if (I != OccupancyValidateMap->end()) {
+      const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+          F, "amdgpu-waves-per-eu", {0, 0}, true);
+      uint64_t Occupancy;
+      const MCExpr *OccupancyExpr = I->getSecond();
+
+      if (TryGetMCExprValue(OccupancyExpr, Occupancy) && Occupancy < MinWEU) {
+        DiagnosticInfoOptimizationFailure Diag(
+            F, F.getSubprogram(),
+            "failed to meet occupancy target given by 'amdgpu-waves-per-eu' in "
+            "'" +
+                F.getName() + "': desired occupancy was " + Twine(MinWEU) +
+                ", final occupancy is " + Twine(Occupancy));
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+  }
+}
+
 bool AMDGPUAsmPrinter::doFinalization(Module &M) {
   // Pad with s_code_end to help tools and guard against instruction prefetch
   // causing stale data in caches. Arguably this should be done by the linker,
@@ -371,39 +471,29 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
     getTargetStreamer()->EmitCodeEnd(STI);
   }
 
-  return AsmPrinter::doFinalization(M);
-}
+  // Assign expressions which can only be resolved when all other functions are
+  // known.
+  RI->Finalize();
+  getTargetStreamer()->EmitMCResourceMaximums(
+      RI->getMaxVGPRSymbol(), RI->getMaxAGPRSymbol(), RI->getMaxSGPRSymbol());
 
-// Print comments that apply to both callable functions and entry points.
-void AMDGPUAsmPrinter::emitCommonFunctionComments(
-    uint32_t NumVGPR, std::optional<uint32_t> NumAGPR, uint32_t TotalNumVGPR,
-    uint32_t NumSGPR, uint64_t ScratchSize, uint64_t CodeSize,
-    const AMDGPUMachineFunction *MFI) {
-  OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);
-  OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);
-  OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);
-  if (NumAGPR) {
-    OutStreamer->emitRawComment(" NumAgprs: " + Twine(*NumAGPR), false);
-    OutStreamer->emitRawComment(" TotalNumVgprs: " + Twine(TotalNumVGPR),
-                                false);
+  for (Function &F : M.functions()) {
+    ValidateMCResourceInfo(F);
   }
-  OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);
-  OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),
-                              false);
+  return AsmPrinter::doFinalization(M);
 }
 
 SmallString<128> AMDGPUAsmPrinter::getMCExprStr(const MCExpr *Value) {
   SmallString<128> Str;
   raw_svector_ostream OSS(Str);
-  int64_t IVal;
-  if (Value->evaluateAsAbsolute(IVal)) {
-    OSS << static_cast<uint64_t>(IVal);
-  } else {
-    Value->print(OSS, MAI);
-  }
+  auto &Streamer = getTargetStreamer()->getStreamer();
+  auto &Context = Streamer.getContext();
+  const MCExpr *New = llvm::TryFold(Value, Context);
+  AMDGPUMCExprPrint(New, OSS, MAI);
   return Str;
 }
 
+// Print comments that apply to both callable functions and entry points.
 void AMDGPUAsmPrinter::emitCommonFunctionComments(
     const MCExpr *NumVGPR, const MCExpr *NumAGPR, const MCExpr *TotalNumVGPR,
     const MCExpr *NumSGPR, const MCExpr *ScratchSize, uint64_t CodeSize,
@@ -573,21 +663,45 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
   emitResourceUsageRemarks(MF, CurrentProgramInfo, MFI->isModuleEntryFunction(),
                            STM.hasMAIInsts());
 
+  {
+    const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+        ResourceUsage->getResourceInfo();
+    RI->gatherResourceInfo(MF, Info);
+    using RIK = MCResourceInfo::ResourceInfoKind;
+    getTargetStreamer()->EmitMCResourceInfo(
+        RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumSGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesVCC),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesFlatScratch),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasDynSizedStack),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasRecursion),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasIndirectCall));
+  }
+
   if (isVerbose()) {
     MCSectionELF *CommentSection =
         Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);
     OutStreamer->switchSection(CommentSection);
 
     if (!MFI->isEntryFunction()) {
+      using RIK = MCResourceInfo::ResourceInfoKind;
       OutStreamer->emitRawComment(" Function info:", false);
-      const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-          ResourceUsage->getResourceInfo(&MF.getFunction());
+
       emitCommonFunctionComments(
-          Info.NumVGPR,
-          STM.hasMAIInsts() ? Info.NumAGPR : std::optional<uint32_t>(),
-          Info.getTotalNumVGPRs(STM),
-          Info.getTotalNumSGPRs(MF.getSubtarget<GCNSubtarget>()),
-          Info.PrivateSegmentSize, getFunctionCodeSize(MF), MFI);
+          RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR)->getVariableValue(),
+          STM.hasMAIInsts() ? RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR)
+                                  ->getVariableValue()
+                            : nullptr,
+          RI->createTotalNumVGPRs(MF, Ctx),
+          RI->createTotalNumSGPRs(
+              MF,
+              MF.getSubtarget<GCNSubtarget>().getTargetID().isXnackOnOrAny(),
+              Ctx),
+          RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize)
+              ->getVariableValue(),
+          getFunctionCodeSize(MF), MFI);
       return false;
     }
 
@@ -755,8 +869,6 @@ uint64_t AMDGPUAsmPrinter::getFunctionCodeSize(const MachineFunction &MF) const
 
 void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
                                         const MachineFunction &MF) {
-  const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-      ResourceUsage->getResourceInfo(&MF.getFunction());
   const GCNSubtarget &STM = MF.getSubtarget<GCNSubtarget>();
   MCContext &Ctx = MF.getContext();
 
@@ -773,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
     return false;
   };
 
-  ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR);
-  ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR);
-  ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM));
-  ProgInfo.AccumOffset =
-      CreateExpr(alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1);
+  auto GetSymRefExpr =
+      [&](MCResourceInfo::ResourceInfoKind RIK) -> const MCExpr * {
+    MCSymbol *Sym = RI->getSymbol(MF.getName(), RIK);
+    return MCSymbolRefExpr::create(Sym, Ctx);
+  };
+
+  const MCExpr *ConstFour = MCConstantExpr::create(4, Ctx);
+  const MCExpr *ConstOne = MCConstantExpr::create(1, Ctx);
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  ProgInfo.NumArchVGPR = GetSymRefExpr(RIK::RIK_NumVGPR);
+  ProgInfo.NumAccVGPR = GetSymRefExpr(RIK::RIK_NumAGPR);
+  ProgInfo.NumVGPR = AMDGPUMCExpr::createTotalNumVGPR(
+      ProgInfo.NumAccVGPR, ProgInfo.NumArchVGPR, Ctx);
+
+  // AccumOffset computed for the MCExpr equivalent of:
+  // alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1;
+  ProgInfo.AccumOffset = MCBinaryExpr::createSub(
+      MCBinaryExpr::createDiv(
+          AMDGPUMCExpr::createAlignTo(
+              AMDGPUMCExpr::createMax({ConstOne, ProgInfo.NumArchVGPR}, Ctx),
+              ConstFour, Ctx),
+          ConstFour, Ctx),
+      ConstOne, Ctx);
   ProgInfo.TgSplit = STM.isTgSplitEnabled();
-  ProgInfo.NumSGPR = CreateExpr(Info.NumExplicitSGPR);
-  ProgInfo.ScratchSize = CreateExpr(Info.PrivateSegmentSize);
-  ProgInfo.VCCUsed = CreateExpr(Info.UsesVCC);
-  ProgInfo.FlatUsed = CreateExpr(Info.UsesFlatScratch);
+  ProgInfo.NumSGPR = GetSymRefExpr(RIK::RIK_NumSGPR);
+  ProgInfo.ScratchSize = GetSymRefExpr(RIK::RIK_PrivateSegSize);
+  ProgInfo.VCCUsed = GetSymRefExpr(RIK::RIK_UsesVCC);
+  ProgInfo.FlatUsed = GetSymRefExpr(RIK::RIK_UsesFlatScratch);
   ProgInfo.DynamicCallStack =
-      CreateExpr(Info.HasDynamicallySizedStack || Info.HasRecursion);
+      MCBinaryExpr::createOr(GetSymRefExpr(RIK::RIK_HasDynSizedStack),
+                             GetSymRefExpr(RIK::RIK_HasRecursion), Ctx);
 
   const uint64_t MaxScratchPerWorkitem =
       STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
@@ -1084,6 +1216,8 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
       STM.computeOccupancy(F, ProgInfo.LDSSize), ProgInfo.NumSGPRsForWavesPerEU,
       ProgInfo.NumVGPRsForWavesPerEU, STM, Ctx);
 
+  OccupancyValidateMap->insert({&MF.getFunction(), ProgInfo.Occupancy});
+
   const auto [MinWEU, MaxWEU] =
       AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", {0, 0}, true);
   uint64_t Occupancy;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index f66bbde42ce278..676a4687ee2af7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -24,6 +24,7 @@ struct AMDGPUResourceUsageAnalysis;
 class AMDGPUTargetStreamer;
 class MCCodeEmitter;
 class MCOperand;
+class MCResourceInfo;
 
 namespace AMDGPU {
 struct MCKernelDescriptor;
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   AMDGPUResourceUsageAnalysis *ResourceUsage;
 
+  std::unique_ptr<MCResourceInfo> RI;
+
   SIProgramInfo CurrentProgramInfo;
 
   std::unique_ptr<AMDGPU::HSAMD::MetadataStreamer> HSAMetadataStream;
 
   MCCodeEmitter *DumpCodeInstEmitter = nullptr;
 
+  // ValidateMCResourceInfo cannot recompute parts of the occupancy as it does
+  // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
+  // really want to track and validate the occupancy.
+  std::unique_ptr<DenseMap<const Function *, const MCExpr *>>
+      OccupancyValidateMap;
+
   uint64_t getFunctionCodeSize(const MachineFunction &MF) const;
 
   void getSIProgramInfo(SIProgramInfo &Out, const MachineFunction &MF);
@@ -60,11 +69,6 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
   void EmitPALMetadata(const MachineFunction &MF,
                        const SIProgramInfo &KernelInfo);
   void emitPALFunctionMetadata(const MachineFunction &MF);
-  void emitCommonFunctionComments(uint32_t NumVGPR,
-                                  std::optional<uint32_t> NumAGPR,
-                                  uint32_t TotalNumVGPR, uint32_t NumSGPR,
-                                  uint64_t ScratchSize, uint64_t CodeSize,
-                                  const AMDGPUMachineFunction *MFI);
   void emitCommonFunctionComments(const MCExpr *NumVGPR, const MCExpr *NumAGPR,
                                   const MCExpr *TotalNumVGPR,
                                   const MCExpr *NumSGPR,
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   SmallString<128> getMCExprStr(const MCExpr *Value);
 
+  /// Attempts to replace the validation that is missed in getSIProgramInfo due
+  /// to MCExpr being unknown. Invoked during doFinalization such that the
+  /// MCResourceInfo symbols are known.
+  void ValidateMCResourceInfo(Function &F);
+
 public:
   explicit AMDGPUAsmPrinter(TargetMachine &TM,
                             std::unique_ptr<MCStreamer> Streamer);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
new file mode 100644
index 00000000000000..58383475b312c9
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info ----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
+  switch (RIK) {
+  case RIK_NumVGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));
+  case RIK_NumAGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_agpr"));
+  case RIK_NumSGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_sgpr"));
+  case RIK_PrivateSegSize:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".private_seg_size"));
+  case RIK_UsesVCC:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_vcc"));
+  case RIK_UsesFlatScratch:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_flat_scratch"));
+  case RIK_HasDynSizedStack:
+    return OutContext.getOrCreateSymbol(FuncName +
+                                        Twine(".has_dyn_sized_stack"));
+  case RIK_HasRecursion:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_recursion"));
+  case RIK_HasIndirectCall:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_indirect_call"));
+  }
+  llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+                                            ResourceInfoKind RIK,
+                                            MCContext &Ctx) {
+  return MCSymbolRefExpr::create(getSymbol(FuncName, RIK), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs() {
+  // Assign expression to get the max register use to the max_num_Xgpr symbol.
+  MCSymbol *MaxVGPRSym = getMaxVGPRSymbol();
+  MCSymbol *MaxAGPRSym = getMaxAGPRSymbol();
+  MCSymbol *MaxSGPRSym = getMaxSGPRSymbol();
+
+  auto assignMaxRegSym = [this](MCSymbol *Sym, int32_t RegCount) {
+    const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+    Sym->setVariableValue(MaxExpr);
+  };
+
+  assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+  assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+  assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::Finalize() {
+  assert(!finalized && "Cannot finalize ResourceInfo again.");
+  finalized = true;
+  assignMaxRegs();
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_sgpr");
+}
+
+void MCResourceInfo::assignResourceInfoExpr(
+   ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Aug 12, 2024

@llvm/pr-subscribers-mc

Author: Janek van Oirschot (JanekvO)

Changes

!!! Stacked PR on top of #95951 commit, please only review the latest commit 51f72f115b340a092c2c9f8569911b944a4efb6d !!!!

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.

  • Function resource info (finalized) validation added through ValidateMCResourceInfo. Previously (and currently, still) done in getSIProgramInfo. However, MCExprs may not be resolvable during getSIProgramInfo due to propagation of yet-to-be-defined function resource info symbols. Does require some caching of the generated occupancy MCExpr computation in getSIProgramInfo to validate against.
  • Unfortunately, requires module specific max_num_Xgprs symbols to be emitted. Cannot generate a separate MCExpr for finding the max without getting recursive symbol defines.
  • Most test changes are caused by the representation changes from constant/known values to MCExprs.

Patch is 369.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102913.diff

66 Files Affected:

  • (modified) clang/test/Frontend/amdgcn-machine-analysis-remarks.cl (+5-5)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+176-42)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h (+14-5)
  • (added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp (+220)
  • (added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h (+94)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp (+22-125)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.h (+12-28)
  • (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.cpp (+359)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.h (+11)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+55-14)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h (+23)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUPALMetadata.cpp (+5-5)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp (+20-26)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.h (+4-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll (+84-84)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll (+9-7)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.workitem.id.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/agpr-register-count.ll (+37-28)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/amdpal-metadata-agpr-register-count.ll (+3-1)
  • (modified) llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll (+28-23)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+26-4)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+10-6)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+6-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+9-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+9-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+9-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll (+39-11)
  • (modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/cndmask-no-def-vcc.ll (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/code-object-v3.ll (+24-10)
  • (modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll (+4-2)
  • (modified) llvm/test/CodeGen/AMDGPU/elf.ll (+4-3)
  • (modified) llvm/test/CodeGen/AMDGPU/enable-scratch-only-dynamic-stack.ll (+9-5)
  • (added) llvm/test/CodeGen/AMDGPU/function-resource-usage.ll (+533)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/inline-asm-reserved-regs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/ipra.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/kernarg-size.ll (+3-1)
  • (modified) llvm/test/CodeGen/AMDGPU/kernel_code_t_recurse.ll (+6-1)
  • (modified) llvm/test/CodeGen/AMDGPU/large-alloca-compute.ll (+23-8)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+15-30)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll (+13-13)
  • (modified) llvm/test/CodeGen/AMDGPU/mesa3d.ll (+8-5)
  • (modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+49-50)
  • (modified) llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll (+12-8)
  • (modified) llvm/test/CodeGen/AMDGPU/recursion.ll (+18-10)
  • (modified) llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll (+32-32)
  • (modified) llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll (+7-6)
  • (modified) llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll (+90-36)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/trap.ll (+21-12)
  • (modified) llvm/test/MC/AMDGPU/amd_kernel_code_t.s (+14-22)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s (+32-32)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s (+30-30)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s (+29-29)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s (+27-27)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s (+27-27)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s (+23-23)
diff --git a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
index a05e21b37b9127..a2dd59a871904c 100644
--- a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
+++ b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
@@ -2,12 +2,12 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx908 -Rpass-analysis=kernel-resource-usage -S -O0 -verify %s -o /dev/null
 
 // expected-remark@+10 {{Function Name: foo}}
-// expected-remark@+9 {{    SGPRs: 13}}
-// expected-remark@+8 {{    VGPRs: 10}}
-// expected-remark@+7 {{    AGPRs: 12}}
-// expected-remark@+6 {{    ScratchSize [bytes/lane]: 0}}
+// expected-remark@+9 {{    SGPRs: foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1))}}
+// expected-remark@+8 {{    VGPRs: foo.num_vgpr}}
+// expected-remark@+7 {{    AGPRs: foo.num_agpr}}
+// expected-remark@+6 {{    ScratchSize [bytes/lane]: foo.private_seg_size}}
 // expected-remark@+5 {{    Dynamic Stack: False}}
-// expected-remark@+4 {{    Occupancy [waves/SIMD]: 10}}
+// expected-remark@+4 {{    Occupancy [waves/SIMD]: occupancy(10, 4, 256, 8, 10, max(foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1)), 1, 0), max(totalnumvgprs(foo.num_agpr, foo.num_vgpr), 1, 0))}}
 // expected-remark@+3 {{    SGPRs Spill: 0}}
 // expected-remark@+2 {{    VGPRs Spill: 0}}
 // expected-remark@+1 {{    LDS Size [bytes/block]: 0}}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index e64e28e01d3d18..97a5cb29d51023 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -18,6 +18,7 @@
 #include "AMDGPUAsmPrinter.h"
 #include "AMDGPU.h"
 #include "AMDGPUHSAMetadataStreamer.h"
+#include "AMDGPUMCResourceInfo.h"
 #include "AMDGPUResourceUsageAnalysis.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUInstPrinter.h"
@@ -92,6 +93,9 @@ AMDGPUAsmPrinter::AMDGPUAsmPrinter(TargetMachine &TM,
                                    std::unique_ptr<MCStreamer> Streamer)
     : AsmPrinter(TM, std::move(Streamer)) {
   assert(OutStreamer && "AsmPrinter constructed without streamer");
+  RI = std::make_unique<MCResourceInfo>(OutContext);
+  OccupancyValidateMap =
+      std::make_unique<DenseMap<const Function *, const MCExpr *>>();
 }
 
 StringRef AMDGPUAsmPrinter::getPassName() const {
@@ -359,6 +363,102 @@ bool AMDGPUAsmPrinter::doInitialization(Module &M) {
   return AsmPrinter::doInitialization(M);
 }
 
+void AMDGPUAsmPrinter::ValidateMCResourceInfo(Function &F) {
+  if (F.isDeclaration() || !AMDGPU::isModuleEntryFunctionCC(F.getCallingConv()))
+    return;
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  const GCNSubtarget &STM = TM.getSubtarget<GCNSubtarget>(F);
+
+  auto TryGetMCExprValue = [](const MCExpr *Value, uint64_t &Res) -> bool {
+    int64_t Val;
+    if (Value->evaluateAsAbsolute(Val)) {
+      Res = Val;
+      return true;
+    }
+    return false;
+  };
+
+  const uint64_t MaxScratchPerWorkitem =
+      STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
+  MCSymbol *ScratchSizeSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_PrivateSegSize);
+  uint64_t ScratchSize;
+  if (ScratchSizeSymbol->isVariable() &&
+      TryGetMCExprValue(ScratchSizeSymbol->getVariableValue(), ScratchSize) &&
+      ScratchSize > MaxScratchPerWorkitem) {
+    DiagnosticInfoStackSize DiagStackSize(F, ScratchSize, MaxScratchPerWorkitem,
+                                          DS_Error);
+    F.getContext().diagnose(DiagStackSize);
+  }
+
+  // Validate addressable scalar registers (i.e., prior to added implicit
+  // SGPRs).
+  MCSymbol *NumSGPRSymbol = RI->getSymbol(F.getName(), RIK::RIK_NumSGPR);
+  if (STM.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS &&
+      !STM.hasSGPRInitBug()) {
+    unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+    uint64_t NumSgpr;
+    if (NumSGPRSymbol->isVariable() &&
+        TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+        NumSgpr > MaxAddressableNumSGPRs) {
+      DiagnosticInfoResourceLimit Diag(F, "addressable scalar registers",
+                                       NumSgpr, MaxAddressableNumSGPRs,
+                                       DS_Error, DK_ResourceLimit);
+      F.getContext().diagnose(Diag);
+      return;
+    }
+  }
+
+  MCSymbol *VCCUsedSymbol = RI->getSymbol(F.getName(), RIK::RIK_UsesVCC);
+  MCSymbol *FlatUsedSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_UsesFlatScratch);
+  uint64_t VCCUsed, FlatUsed, NumSgpr;
+
+  if (NumSGPRSymbol->isVariable() && VCCUsedSymbol->isVariable() &&
+      FlatUsedSymbol->isVariable() &&
+      TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+      TryGetMCExprValue(VCCUsedSymbol->getVariableValue(), VCCUsed) &&
+      TryGetMCExprValue(FlatUsedSymbol->getVariableValue(), FlatUsed)) {
+
+    // Recomputes NumSgprs + implicit SGPRs but all symbols should now be
+    // resolvable.
+    NumSgpr += IsaInfo::getNumExtraSGPRs(
+        &STM, VCCUsed, FlatUsed,
+        getTargetStreamer()->getTargetID()->isXnackOnOrAny());
+    if (STM.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ||
+        STM.hasSGPRInitBug()) {
+      unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+      if (NumSgpr > MaxAddressableNumSGPRs) {
+        DiagnosticInfoResourceLimit Diag(F, "scalar registers", NumSgpr,
+                                         MaxAddressableNumSGPRs, DS_Error,
+                                         DK_ResourceLimit);
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+
+    auto I = OccupancyValidateMap->find(&F);
+    if (I != OccupancyValidateMap->end()) {
+      const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+          F, "amdgpu-waves-per-eu", {0, 0}, true);
+      uint64_t Occupancy;
+      const MCExpr *OccupancyExpr = I->getSecond();
+
+      if (TryGetMCExprValue(OccupancyExpr, Occupancy) && Occupancy < MinWEU) {
+        DiagnosticInfoOptimizationFailure Diag(
+            F, F.getSubprogram(),
+            "failed to meet occupancy target given by 'amdgpu-waves-per-eu' in "
+            "'" +
+                F.getName() + "': desired occupancy was " + Twine(MinWEU) +
+                ", final occupancy is " + Twine(Occupancy));
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+  }
+}
+
 bool AMDGPUAsmPrinter::doFinalization(Module &M) {
   // Pad with s_code_end to help tools and guard against instruction prefetch
   // causing stale data in caches. Arguably this should be done by the linker,
@@ -371,39 +471,29 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
     getTargetStreamer()->EmitCodeEnd(STI);
   }
 
-  return AsmPrinter::doFinalization(M);
-}
+  // Assign expressions which can only be resolved when all other functions are
+  // known.
+  RI->Finalize();
+  getTargetStreamer()->EmitMCResourceMaximums(
+      RI->getMaxVGPRSymbol(), RI->getMaxAGPRSymbol(), RI->getMaxSGPRSymbol());
 
-// Print comments that apply to both callable functions and entry points.
-void AMDGPUAsmPrinter::emitCommonFunctionComments(
-    uint32_t NumVGPR, std::optional<uint32_t> NumAGPR, uint32_t TotalNumVGPR,
-    uint32_t NumSGPR, uint64_t ScratchSize, uint64_t CodeSize,
-    const AMDGPUMachineFunction *MFI) {
-  OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);
-  OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);
-  OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);
-  if (NumAGPR) {
-    OutStreamer->emitRawComment(" NumAgprs: " + Twine(*NumAGPR), false);
-    OutStreamer->emitRawComment(" TotalNumVgprs: " + Twine(TotalNumVGPR),
-                                false);
+  for (Function &F : M.functions()) {
+    ValidateMCResourceInfo(F);
   }
-  OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);
-  OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),
-                              false);
+  return AsmPrinter::doFinalization(M);
 }
 
 SmallString<128> AMDGPUAsmPrinter::getMCExprStr(const MCExpr *Value) {
   SmallString<128> Str;
   raw_svector_ostream OSS(Str);
-  int64_t IVal;
-  if (Value->evaluateAsAbsolute(IVal)) {
-    OSS << static_cast<uint64_t>(IVal);
-  } else {
-    Value->print(OSS, MAI);
-  }
+  auto &Streamer = getTargetStreamer()->getStreamer();
+  auto &Context = Streamer.getContext();
+  const MCExpr *New = llvm::TryFold(Value, Context);
+  AMDGPUMCExprPrint(New, OSS, MAI);
   return Str;
 }
 
+// Print comments that apply to both callable functions and entry points.
 void AMDGPUAsmPrinter::emitCommonFunctionComments(
     const MCExpr *NumVGPR, const MCExpr *NumAGPR, const MCExpr *TotalNumVGPR,
     const MCExpr *NumSGPR, const MCExpr *ScratchSize, uint64_t CodeSize,
@@ -573,21 +663,45 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
   emitResourceUsageRemarks(MF, CurrentProgramInfo, MFI->isModuleEntryFunction(),
                            STM.hasMAIInsts());
 
+  {
+    const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+        ResourceUsage->getResourceInfo();
+    RI->gatherResourceInfo(MF, Info);
+    using RIK = MCResourceInfo::ResourceInfoKind;
+    getTargetStreamer()->EmitMCResourceInfo(
+        RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumSGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesVCC),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesFlatScratch),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasDynSizedStack),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasRecursion),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasIndirectCall));
+  }
+
   if (isVerbose()) {
     MCSectionELF *CommentSection =
         Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);
     OutStreamer->switchSection(CommentSection);
 
     if (!MFI->isEntryFunction()) {
+      using RIK = MCResourceInfo::ResourceInfoKind;
       OutStreamer->emitRawComment(" Function info:", false);
-      const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-          ResourceUsage->getResourceInfo(&MF.getFunction());
+
       emitCommonFunctionComments(
-          Info.NumVGPR,
-          STM.hasMAIInsts() ? Info.NumAGPR : std::optional<uint32_t>(),
-          Info.getTotalNumVGPRs(STM),
-          Info.getTotalNumSGPRs(MF.getSubtarget<GCNSubtarget>()),
-          Info.PrivateSegmentSize, getFunctionCodeSize(MF), MFI);
+          RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR)->getVariableValue(),
+          STM.hasMAIInsts() ? RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR)
+                                  ->getVariableValue()
+                            : nullptr,
+          RI->createTotalNumVGPRs(MF, Ctx),
+          RI->createTotalNumSGPRs(
+              MF,
+              MF.getSubtarget<GCNSubtarget>().getTargetID().isXnackOnOrAny(),
+              Ctx),
+          RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize)
+              ->getVariableValue(),
+          getFunctionCodeSize(MF), MFI);
       return false;
     }
 
@@ -755,8 +869,6 @@ uint64_t AMDGPUAsmPrinter::getFunctionCodeSize(const MachineFunction &MF) const
 
 void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
                                         const MachineFunction &MF) {
-  const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-      ResourceUsage->getResourceInfo(&MF.getFunction());
   const GCNSubtarget &STM = MF.getSubtarget<GCNSubtarget>();
   MCContext &Ctx = MF.getContext();
 
@@ -773,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
     return false;
   };
 
-  ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR);
-  ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR);
-  ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM));
-  ProgInfo.AccumOffset =
-      CreateExpr(alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1);
+  auto GetSymRefExpr =
+      [&](MCResourceInfo::ResourceInfoKind RIK) -> const MCExpr * {
+    MCSymbol *Sym = RI->getSymbol(MF.getName(), RIK);
+    return MCSymbolRefExpr::create(Sym, Ctx);
+  };
+
+  const MCExpr *ConstFour = MCConstantExpr::create(4, Ctx);
+  const MCExpr *ConstOne = MCConstantExpr::create(1, Ctx);
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  ProgInfo.NumArchVGPR = GetSymRefExpr(RIK::RIK_NumVGPR);
+  ProgInfo.NumAccVGPR = GetSymRefExpr(RIK::RIK_NumAGPR);
+  ProgInfo.NumVGPR = AMDGPUMCExpr::createTotalNumVGPR(
+      ProgInfo.NumAccVGPR, ProgInfo.NumArchVGPR, Ctx);
+
+  // AccumOffset computed for the MCExpr equivalent of:
+  // alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1;
+  ProgInfo.AccumOffset = MCBinaryExpr::createSub(
+      MCBinaryExpr::createDiv(
+          AMDGPUMCExpr::createAlignTo(
+              AMDGPUMCExpr::createMax({ConstOne, ProgInfo.NumArchVGPR}, Ctx),
+              ConstFour, Ctx),
+          ConstFour, Ctx),
+      ConstOne, Ctx);
   ProgInfo.TgSplit = STM.isTgSplitEnabled();
-  ProgInfo.NumSGPR = CreateExpr(Info.NumExplicitSGPR);
-  ProgInfo.ScratchSize = CreateExpr(Info.PrivateSegmentSize);
-  ProgInfo.VCCUsed = CreateExpr(Info.UsesVCC);
-  ProgInfo.FlatUsed = CreateExpr(Info.UsesFlatScratch);
+  ProgInfo.NumSGPR = GetSymRefExpr(RIK::RIK_NumSGPR);
+  ProgInfo.ScratchSize = GetSymRefExpr(RIK::RIK_PrivateSegSize);
+  ProgInfo.VCCUsed = GetSymRefExpr(RIK::RIK_UsesVCC);
+  ProgInfo.FlatUsed = GetSymRefExpr(RIK::RIK_UsesFlatScratch);
   ProgInfo.DynamicCallStack =
-      CreateExpr(Info.HasDynamicallySizedStack || Info.HasRecursion);
+      MCBinaryExpr::createOr(GetSymRefExpr(RIK::RIK_HasDynSizedStack),
+                             GetSymRefExpr(RIK::RIK_HasRecursion), Ctx);
 
   const uint64_t MaxScratchPerWorkitem =
       STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
@@ -1084,6 +1216,8 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
       STM.computeOccupancy(F, ProgInfo.LDSSize), ProgInfo.NumSGPRsForWavesPerEU,
       ProgInfo.NumVGPRsForWavesPerEU, STM, Ctx);
 
+  OccupancyValidateMap->insert({&MF.getFunction(), ProgInfo.Occupancy});
+
   const auto [MinWEU, MaxWEU] =
       AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", {0, 0}, true);
   uint64_t Occupancy;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index f66bbde42ce278..676a4687ee2af7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -24,6 +24,7 @@ struct AMDGPUResourceUsageAnalysis;
 class AMDGPUTargetStreamer;
 class MCCodeEmitter;
 class MCOperand;
+class MCResourceInfo;
 
 namespace AMDGPU {
 struct MCKernelDescriptor;
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   AMDGPUResourceUsageAnalysis *ResourceUsage;
 
+  std::unique_ptr<MCResourceInfo> RI;
+
   SIProgramInfo CurrentProgramInfo;
 
   std::unique_ptr<AMDGPU::HSAMD::MetadataStreamer> HSAMetadataStream;
 
   MCCodeEmitter *DumpCodeInstEmitter = nullptr;
 
+  // ValidateMCResourceInfo cannot recompute parts of the occupancy as it does
+  // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
+  // really want to track and validate the occupancy.
+  std::unique_ptr<DenseMap<const Function *, const MCExpr *>>
+      OccupancyValidateMap;
+
   uint64_t getFunctionCodeSize(const MachineFunction &MF) const;
 
   void getSIProgramInfo(SIProgramInfo &Out, const MachineFunction &MF);
@@ -60,11 +69,6 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
   void EmitPALMetadata(const MachineFunction &MF,
                        const SIProgramInfo &KernelInfo);
   void emitPALFunctionMetadata(const MachineFunction &MF);
-  void emitCommonFunctionComments(uint32_t NumVGPR,
-                                  std::optional<uint32_t> NumAGPR,
-                                  uint32_t TotalNumVGPR, uint32_t NumSGPR,
-                                  uint64_t ScratchSize, uint64_t CodeSize,
-                                  const AMDGPUMachineFunction *MFI);
   void emitCommonFunctionComments(const MCExpr *NumVGPR, const MCExpr *NumAGPR,
                                   const MCExpr *TotalNumVGPR,
                                   const MCExpr *NumSGPR,
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   SmallString<128> getMCExprStr(const MCExpr *Value);
 
+  /// Attempts to replace the validation that is missed in getSIProgramInfo due
+  /// to MCExpr being unknown. Invoked during doFinalization such that the
+  /// MCResourceInfo symbols are known.
+  void ValidateMCResourceInfo(Function &F);
+
 public:
   explicit AMDGPUAsmPrinter(TargetMachine &TM,
                             std::unique_ptr<MCStreamer> Streamer);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
new file mode 100644
index 00000000000000..58383475b312c9
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info ----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
+  switch (RIK) {
+  case RIK_NumVGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));
+  case RIK_NumAGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_agpr"));
+  case RIK_NumSGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_sgpr"));
+  case RIK_PrivateSegSize:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".private_seg_size"));
+  case RIK_UsesVCC:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_vcc"));
+  case RIK_UsesFlatScratch:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_flat_scratch"));
+  case RIK_HasDynSizedStack:
+    return OutContext.getOrCreateSymbol(FuncName +
+                                        Twine(".has_dyn_sized_stack"));
+  case RIK_HasRecursion:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_recursion"));
+  case RIK_HasIndirectCall:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_indirect_call"));
+  }
+  llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+                                            ResourceInfoKind RIK,
+                                            MCContext &Ctx) {
+  return MCSymbolRefExpr::create(getSymbol(FuncName, RIK), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs() {
+  // Assign expression to get the max register use to the max_num_Xgpr symbol.
+  MCSymbol *MaxVGPRSym = getMaxVGPRSymbol();
+  MCSymbol *MaxAGPRSym = getMaxAGPRSymbol();
+  MCSymbol *MaxSGPRSym = getMaxSGPRSymbol();
+
+  auto assignMaxRegSym = [this](MCSymbol *Sym, int32_t RegCount) {
+    const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+    Sym->setVariableValue(MaxExpr);
+  };
+
+  assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+  assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+  assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::Finalize() {
+  assert(!finalized && "Cannot finalize ResourceInfo again.");
+  finalized = true;
+  assignMaxRegs();
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_sgpr");
+}
+
+void MCResourceInfo::assignResourceInfoExpr(
+   ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Aug 12, 2024

@llvm/pr-subscribers-llvm-globalisel

Author: Janek van Oirschot (JanekvO)

Changes

!!! Stacked PR on top of #95951 commit, please only review the latest commit 51f72f115b340a092c2c9f8569911b944a4efb6d !!!!

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.

  • Function resource info (finalized) validation added through ValidateMCResourceInfo. Previously (and currently, still) done in getSIProgramInfo. However, MCExprs may not be resolvable during getSIProgramInfo due to propagation of yet-to-be-defined function resource info symbols. Does require some caching of the generated occupancy MCExpr computation in getSIProgramInfo to validate against.
  • Unfortunately, requires module specific max_num_Xgprs symbols to be emitted. Cannot generate a separate MCExpr for finding the max without getting recursive symbol defines.
  • Most test changes are caused by the representation changes from constant/known values to MCExprs.

Patch is 369.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102913.diff

66 Files Affected:

  • (modified) clang/test/Frontend/amdgcn-machine-analysis-remarks.cl (+5-5)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+176-42)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h (+14-5)
  • (added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp (+220)
  • (added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h (+94)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp (+22-125)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.h (+12-28)
  • (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.cpp (+359)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.h (+11)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+55-14)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h (+23)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUPALMetadata.cpp (+5-5)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp (+20-26)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.h (+4-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll (+84-84)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll (+9-7)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.workitem.id.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/agpr-register-count.ll (+37-28)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/amdpal-metadata-agpr-register-count.ll (+3-1)
  • (modified) llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll (+28-23)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+26-4)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+10-6)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+6-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+9-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+9-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+9-2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll (+39-11)
  • (modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/cndmask-no-def-vcc.ll (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/code-object-v3.ll (+24-10)
  • (modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll (+4-2)
  • (modified) llvm/test/CodeGen/AMDGPU/elf.ll (+4-3)
  • (modified) llvm/test/CodeGen/AMDGPU/enable-scratch-only-dynamic-stack.ll (+9-5)
  • (added) llvm/test/CodeGen/AMDGPU/function-resource-usage.ll (+533)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/inline-asm-reserved-regs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/ipra.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/kernarg-size.ll (+3-1)
  • (modified) llvm/test/CodeGen/AMDGPU/kernel_code_t_recurse.ll (+6-1)
  • (modified) llvm/test/CodeGen/AMDGPU/large-alloca-compute.ll (+23-8)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+15-30)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll (+13-13)
  • (modified) llvm/test/CodeGen/AMDGPU/mesa3d.ll (+8-5)
  • (modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+49-50)
  • (modified) llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll (+12-8)
  • (modified) llvm/test/CodeGen/AMDGPU/recursion.ll (+18-10)
  • (modified) llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll (+32-32)
  • (modified) llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll (+7-6)
  • (modified) llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll (+90-36)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/trap.ll (+21-12)
  • (modified) llvm/test/MC/AMDGPU/amd_kernel_code_t.s (+14-22)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s (+32-32)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s (+30-30)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s (+29-29)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s (+27-27)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s (+27-27)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s (+23-23)
diff --git a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
index a05e21b37b9127..a2dd59a871904c 100644
--- a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
+++ b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
@@ -2,12 +2,12 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx908 -Rpass-analysis=kernel-resource-usage -S -O0 -verify %s -o /dev/null
 
 // expected-remark@+10 {{Function Name: foo}}
-// expected-remark@+9 {{    SGPRs: 13}}
-// expected-remark@+8 {{    VGPRs: 10}}
-// expected-remark@+7 {{    AGPRs: 12}}
-// expected-remark@+6 {{    ScratchSize [bytes/lane]: 0}}
+// expected-remark@+9 {{    SGPRs: foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1))}}
+// expected-remark@+8 {{    VGPRs: foo.num_vgpr}}
+// expected-remark@+7 {{    AGPRs: foo.num_agpr}}
+// expected-remark@+6 {{    ScratchSize [bytes/lane]: foo.private_seg_size}}
 // expected-remark@+5 {{    Dynamic Stack: False}}
-// expected-remark@+4 {{    Occupancy [waves/SIMD]: 10}}
+// expected-remark@+4 {{    Occupancy [waves/SIMD]: occupancy(10, 4, 256, 8, 10, max(foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1)), 1, 0), max(totalnumvgprs(foo.num_agpr, foo.num_vgpr), 1, 0))}}
 // expected-remark@+3 {{    SGPRs Spill: 0}}
 // expected-remark@+2 {{    VGPRs Spill: 0}}
 // expected-remark@+1 {{    LDS Size [bytes/block]: 0}}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index e64e28e01d3d18..97a5cb29d51023 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -18,6 +18,7 @@
 #include "AMDGPUAsmPrinter.h"
 #include "AMDGPU.h"
 #include "AMDGPUHSAMetadataStreamer.h"
+#include "AMDGPUMCResourceInfo.h"
 #include "AMDGPUResourceUsageAnalysis.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUInstPrinter.h"
@@ -92,6 +93,9 @@ AMDGPUAsmPrinter::AMDGPUAsmPrinter(TargetMachine &TM,
                                    std::unique_ptr<MCStreamer> Streamer)
     : AsmPrinter(TM, std::move(Streamer)) {
   assert(OutStreamer && "AsmPrinter constructed without streamer");
+  RI = std::make_unique<MCResourceInfo>(OutContext);
+  OccupancyValidateMap =
+      std::make_unique<DenseMap<const Function *, const MCExpr *>>();
 }
 
 StringRef AMDGPUAsmPrinter::getPassName() const {
@@ -359,6 +363,102 @@ bool AMDGPUAsmPrinter::doInitialization(Module &M) {
   return AsmPrinter::doInitialization(M);
 }
 
+void AMDGPUAsmPrinter::ValidateMCResourceInfo(Function &F) {
+  if (F.isDeclaration() || !AMDGPU::isModuleEntryFunctionCC(F.getCallingConv()))
+    return;
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  const GCNSubtarget &STM = TM.getSubtarget<GCNSubtarget>(F);
+
+  auto TryGetMCExprValue = [](const MCExpr *Value, uint64_t &Res) -> bool {
+    int64_t Val;
+    if (Value->evaluateAsAbsolute(Val)) {
+      Res = Val;
+      return true;
+    }
+    return false;
+  };
+
+  const uint64_t MaxScratchPerWorkitem =
+      STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
+  MCSymbol *ScratchSizeSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_PrivateSegSize);
+  uint64_t ScratchSize;
+  if (ScratchSizeSymbol->isVariable() &&
+      TryGetMCExprValue(ScratchSizeSymbol->getVariableValue(), ScratchSize) &&
+      ScratchSize > MaxScratchPerWorkitem) {
+    DiagnosticInfoStackSize DiagStackSize(F, ScratchSize, MaxScratchPerWorkitem,
+                                          DS_Error);
+    F.getContext().diagnose(DiagStackSize);
+  }
+
+  // Validate addressable scalar registers (i.e., prior to added implicit
+  // SGPRs).
+  MCSymbol *NumSGPRSymbol = RI->getSymbol(F.getName(), RIK::RIK_NumSGPR);
+  if (STM.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS &&
+      !STM.hasSGPRInitBug()) {
+    unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+    uint64_t NumSgpr;
+    if (NumSGPRSymbol->isVariable() &&
+        TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+        NumSgpr > MaxAddressableNumSGPRs) {
+      DiagnosticInfoResourceLimit Diag(F, "addressable scalar registers",
+                                       NumSgpr, MaxAddressableNumSGPRs,
+                                       DS_Error, DK_ResourceLimit);
+      F.getContext().diagnose(Diag);
+      return;
+    }
+  }
+
+  MCSymbol *VCCUsedSymbol = RI->getSymbol(F.getName(), RIK::RIK_UsesVCC);
+  MCSymbol *FlatUsedSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_UsesFlatScratch);
+  uint64_t VCCUsed, FlatUsed, NumSgpr;
+
+  if (NumSGPRSymbol->isVariable() && VCCUsedSymbol->isVariable() &&
+      FlatUsedSymbol->isVariable() &&
+      TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+      TryGetMCExprValue(VCCUsedSymbol->getVariableValue(), VCCUsed) &&
+      TryGetMCExprValue(FlatUsedSymbol->getVariableValue(), FlatUsed)) {
+
+    // Recomputes NumSgprs + implicit SGPRs but all symbols should now be
+    // resolvable.
+    NumSgpr += IsaInfo::getNumExtraSGPRs(
+        &STM, VCCUsed, FlatUsed,
+        getTargetStreamer()->getTargetID()->isXnackOnOrAny());
+    if (STM.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ||
+        STM.hasSGPRInitBug()) {
+      unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+      if (NumSgpr > MaxAddressableNumSGPRs) {
+        DiagnosticInfoResourceLimit Diag(F, "scalar registers", NumSgpr,
+                                         MaxAddressableNumSGPRs, DS_Error,
+                                         DK_ResourceLimit);
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+
+    auto I = OccupancyValidateMap->find(&F);
+    if (I != OccupancyValidateMap->end()) {
+      const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+          F, "amdgpu-waves-per-eu", {0, 0}, true);
+      uint64_t Occupancy;
+      const MCExpr *OccupancyExpr = I->getSecond();
+
+      if (TryGetMCExprValue(OccupancyExpr, Occupancy) && Occupancy < MinWEU) {
+        DiagnosticInfoOptimizationFailure Diag(
+            F, F.getSubprogram(),
+            "failed to meet occupancy target given by 'amdgpu-waves-per-eu' in "
+            "'" +
+                F.getName() + "': desired occupancy was " + Twine(MinWEU) +
+                ", final occupancy is " + Twine(Occupancy));
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+  }
+}
+
 bool AMDGPUAsmPrinter::doFinalization(Module &M) {
   // Pad with s_code_end to help tools and guard against instruction prefetch
   // causing stale data in caches. Arguably this should be done by the linker,
@@ -371,39 +471,29 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
     getTargetStreamer()->EmitCodeEnd(STI);
   }
 
-  return AsmPrinter::doFinalization(M);
-}
+  // Assign expressions which can only be resolved when all other functions are
+  // known.
+  RI->Finalize();
+  getTargetStreamer()->EmitMCResourceMaximums(
+      RI->getMaxVGPRSymbol(), RI->getMaxAGPRSymbol(), RI->getMaxSGPRSymbol());
 
-// Print comments that apply to both callable functions and entry points.
-void AMDGPUAsmPrinter::emitCommonFunctionComments(
-    uint32_t NumVGPR, std::optional<uint32_t> NumAGPR, uint32_t TotalNumVGPR,
-    uint32_t NumSGPR, uint64_t ScratchSize, uint64_t CodeSize,
-    const AMDGPUMachineFunction *MFI) {
-  OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);
-  OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);
-  OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);
-  if (NumAGPR) {
-    OutStreamer->emitRawComment(" NumAgprs: " + Twine(*NumAGPR), false);
-    OutStreamer->emitRawComment(" TotalNumVgprs: " + Twine(TotalNumVGPR),
-                                false);
+  for (Function &F : M.functions()) {
+    ValidateMCResourceInfo(F);
   }
-  OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);
-  OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),
-                              false);
+  return AsmPrinter::doFinalization(M);
 }
 
 SmallString<128> AMDGPUAsmPrinter::getMCExprStr(const MCExpr *Value) {
   SmallString<128> Str;
   raw_svector_ostream OSS(Str);
-  int64_t IVal;
-  if (Value->evaluateAsAbsolute(IVal)) {
-    OSS << static_cast<uint64_t>(IVal);
-  } else {
-    Value->print(OSS, MAI);
-  }
+  auto &Streamer = getTargetStreamer()->getStreamer();
+  auto &Context = Streamer.getContext();
+  const MCExpr *New = llvm::TryFold(Value, Context);
+  AMDGPUMCExprPrint(New, OSS, MAI);
   return Str;
 }
 
+// Print comments that apply to both callable functions and entry points.
 void AMDGPUAsmPrinter::emitCommonFunctionComments(
     const MCExpr *NumVGPR, const MCExpr *NumAGPR, const MCExpr *TotalNumVGPR,
     const MCExpr *NumSGPR, const MCExpr *ScratchSize, uint64_t CodeSize,
@@ -573,21 +663,45 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
   emitResourceUsageRemarks(MF, CurrentProgramInfo, MFI->isModuleEntryFunction(),
                            STM.hasMAIInsts());
 
+  {
+    const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+        ResourceUsage->getResourceInfo();
+    RI->gatherResourceInfo(MF, Info);
+    using RIK = MCResourceInfo::ResourceInfoKind;
+    getTargetStreamer()->EmitMCResourceInfo(
+        RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumSGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesVCC),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesFlatScratch),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasDynSizedStack),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasRecursion),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasIndirectCall));
+  }
+
   if (isVerbose()) {
     MCSectionELF *CommentSection =
         Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);
     OutStreamer->switchSection(CommentSection);
 
     if (!MFI->isEntryFunction()) {
+      using RIK = MCResourceInfo::ResourceInfoKind;
       OutStreamer->emitRawComment(" Function info:", false);
-      const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-          ResourceUsage->getResourceInfo(&MF.getFunction());
+
       emitCommonFunctionComments(
-          Info.NumVGPR,
-          STM.hasMAIInsts() ? Info.NumAGPR : std::optional<uint32_t>(),
-          Info.getTotalNumVGPRs(STM),
-          Info.getTotalNumSGPRs(MF.getSubtarget<GCNSubtarget>()),
-          Info.PrivateSegmentSize, getFunctionCodeSize(MF), MFI);
+          RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR)->getVariableValue(),
+          STM.hasMAIInsts() ? RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR)
+                                  ->getVariableValue()
+                            : nullptr,
+          RI->createTotalNumVGPRs(MF, Ctx),
+          RI->createTotalNumSGPRs(
+              MF,
+              MF.getSubtarget<GCNSubtarget>().getTargetID().isXnackOnOrAny(),
+              Ctx),
+          RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize)
+              ->getVariableValue(),
+          getFunctionCodeSize(MF), MFI);
       return false;
     }
 
@@ -755,8 +869,6 @@ uint64_t AMDGPUAsmPrinter::getFunctionCodeSize(const MachineFunction &MF) const
 
 void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
                                         const MachineFunction &MF) {
-  const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-      ResourceUsage->getResourceInfo(&MF.getFunction());
   const GCNSubtarget &STM = MF.getSubtarget<GCNSubtarget>();
   MCContext &Ctx = MF.getContext();
 
@@ -773,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
     return false;
   };
 
-  ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR);
-  ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR);
-  ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM));
-  ProgInfo.AccumOffset =
-      CreateExpr(alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1);
+  auto GetSymRefExpr =
+      [&](MCResourceInfo::ResourceInfoKind RIK) -> const MCExpr * {
+    MCSymbol *Sym = RI->getSymbol(MF.getName(), RIK);
+    return MCSymbolRefExpr::create(Sym, Ctx);
+  };
+
+  const MCExpr *ConstFour = MCConstantExpr::create(4, Ctx);
+  const MCExpr *ConstOne = MCConstantExpr::create(1, Ctx);
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  ProgInfo.NumArchVGPR = GetSymRefExpr(RIK::RIK_NumVGPR);
+  ProgInfo.NumAccVGPR = GetSymRefExpr(RIK::RIK_NumAGPR);
+  ProgInfo.NumVGPR = AMDGPUMCExpr::createTotalNumVGPR(
+      ProgInfo.NumAccVGPR, ProgInfo.NumArchVGPR, Ctx);
+
+  // AccumOffset computed for the MCExpr equivalent of:
+  // alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1;
+  ProgInfo.AccumOffset = MCBinaryExpr::createSub(
+      MCBinaryExpr::createDiv(
+          AMDGPUMCExpr::createAlignTo(
+              AMDGPUMCExpr::createMax({ConstOne, ProgInfo.NumArchVGPR}, Ctx),
+              ConstFour, Ctx),
+          ConstFour, Ctx),
+      ConstOne, Ctx);
   ProgInfo.TgSplit = STM.isTgSplitEnabled();
-  ProgInfo.NumSGPR = CreateExpr(Info.NumExplicitSGPR);
-  ProgInfo.ScratchSize = CreateExpr(Info.PrivateSegmentSize);
-  ProgInfo.VCCUsed = CreateExpr(Info.UsesVCC);
-  ProgInfo.FlatUsed = CreateExpr(Info.UsesFlatScratch);
+  ProgInfo.NumSGPR = GetSymRefExpr(RIK::RIK_NumSGPR);
+  ProgInfo.ScratchSize = GetSymRefExpr(RIK::RIK_PrivateSegSize);
+  ProgInfo.VCCUsed = GetSymRefExpr(RIK::RIK_UsesVCC);
+  ProgInfo.FlatUsed = GetSymRefExpr(RIK::RIK_UsesFlatScratch);
   ProgInfo.DynamicCallStack =
-      CreateExpr(Info.HasDynamicallySizedStack || Info.HasRecursion);
+      MCBinaryExpr::createOr(GetSymRefExpr(RIK::RIK_HasDynSizedStack),
+                             GetSymRefExpr(RIK::RIK_HasRecursion), Ctx);
 
   const uint64_t MaxScratchPerWorkitem =
       STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
@@ -1084,6 +1216,8 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
       STM.computeOccupancy(F, ProgInfo.LDSSize), ProgInfo.NumSGPRsForWavesPerEU,
       ProgInfo.NumVGPRsForWavesPerEU, STM, Ctx);
 
+  OccupancyValidateMap->insert({&MF.getFunction(), ProgInfo.Occupancy});
+
   const auto [MinWEU, MaxWEU] =
       AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", {0, 0}, true);
   uint64_t Occupancy;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index f66bbde42ce278..676a4687ee2af7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -24,6 +24,7 @@ struct AMDGPUResourceUsageAnalysis;
 class AMDGPUTargetStreamer;
 class MCCodeEmitter;
 class MCOperand;
+class MCResourceInfo;
 
 namespace AMDGPU {
 struct MCKernelDescriptor;
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   AMDGPUResourceUsageAnalysis *ResourceUsage;
 
+  std::unique_ptr<MCResourceInfo> RI;
+
   SIProgramInfo CurrentProgramInfo;
 
   std::unique_ptr<AMDGPU::HSAMD::MetadataStreamer> HSAMetadataStream;
 
   MCCodeEmitter *DumpCodeInstEmitter = nullptr;
 
+  // ValidateMCResourceInfo cannot recompute parts of the occupancy as it does
+  // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
+  // really want to track and validate the occupancy.
+  std::unique_ptr<DenseMap<const Function *, const MCExpr *>>
+      OccupancyValidateMap;
+
   uint64_t getFunctionCodeSize(const MachineFunction &MF) const;
 
   void getSIProgramInfo(SIProgramInfo &Out, const MachineFunction &MF);
@@ -60,11 +69,6 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
   void EmitPALMetadata(const MachineFunction &MF,
                        const SIProgramInfo &KernelInfo);
   void emitPALFunctionMetadata(const MachineFunction &MF);
-  void emitCommonFunctionComments(uint32_t NumVGPR,
-                                  std::optional<uint32_t> NumAGPR,
-                                  uint32_t TotalNumVGPR, uint32_t NumSGPR,
-                                  uint64_t ScratchSize, uint64_t CodeSize,
-                                  const AMDGPUMachineFunction *MFI);
   void emitCommonFunctionComments(const MCExpr *NumVGPR, const MCExpr *NumAGPR,
                                   const MCExpr *TotalNumVGPR,
                                   const MCExpr *NumSGPR,
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   SmallString<128> getMCExprStr(const MCExpr *Value);
 
+  /// Attempts to replace the validation that is missed in getSIProgramInfo due
+  /// to MCExpr being unknown. Invoked during doFinalization such that the
+  /// MCResourceInfo symbols are known.
+  void ValidateMCResourceInfo(Function &F);
+
 public:
   explicit AMDGPUAsmPrinter(TargetMachine &TM,
                             std::unique_ptr<MCStreamer> Streamer);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
new file mode 100644
index 00000000000000..58383475b312c9
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info ----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
+  switch (RIK) {
+  case RIK_NumVGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));
+  case RIK_NumAGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_agpr"));
+  case RIK_NumSGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_sgpr"));
+  case RIK_PrivateSegSize:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".private_seg_size"));
+  case RIK_UsesVCC:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_vcc"));
+  case RIK_UsesFlatScratch:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_flat_scratch"));
+  case RIK_HasDynSizedStack:
+    return OutContext.getOrCreateSymbol(FuncName +
+                                        Twine(".has_dyn_sized_stack"));
+  case RIK_HasRecursion:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_recursion"));
+  case RIK_HasIndirectCall:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_indirect_call"));
+  }
+  llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+                                            ResourceInfoKind RIK,
+                                            MCContext &Ctx) {
+  return MCSymbolRefExpr::create(getSymbol(FuncName, RIK), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs() {
+  // Assign expression to get the max register use to the max_num_Xgpr symbol.
+  MCSymbol *MaxVGPRSym = getMaxVGPRSymbol();
+  MCSymbol *MaxAGPRSym = getMaxAGPRSymbol();
+  MCSymbol *MaxSGPRSym = getMaxSGPRSymbol();
+
+  auto assignMaxRegSym = [this](MCSymbol *Sym, int32_t RegCount) {
+    const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+    Sym->setVariableValue(MaxExpr);
+  };
+
+  assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+  assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+  assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::Finalize() {
+  assert(!finalized && "Cannot finalize ResourceInfo again.");
+  finalized = true;
+  assignMaxRegs();
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_sgpr");
+}
+
+void MCResourceInfo::assignResourceInfoExpr(
+   ...
[truncated]

Copy link

github-actions bot commented Aug 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@JanekvO
Copy link
Contributor Author

JanekvO commented Aug 12, 2024

Upstream's formatting does not seem to agree with my local git clang-format.

@slinder1
Copy link
Contributor

Is it best to leave comments on the commit from the PR comment, i.e. currently 51f72f1 ?

I can't seem to start a review, so it will be individual comments (I think). Is there another way to handle this?

@JanekvO JanekvO force-pushed the resource-pass-convert branch from 422a81b to 5dff9e2 Compare August 15, 2024 13:21
@JanekvO
Copy link
Contributor Author

JanekvO commented Aug 15, 2024

Is it best to leave comments on the commit from the PR comment, i.e. currently 51f72f1 ?

I can't seem to start a review, so it will be individual comments (I think). Is there another way to handle this?

Apologies, I think some of the additional stacked PR tooling might've been better to use rather than this manual stacked PR but I just haven't looked into any of said tooling yet (Graphite?). The PR that this one depended on has been pushed and I've rebased so it should be good to go. Thanks!

@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {

AMDGPUResourceUsageAnalysis *ResourceUsage;

std::unique_ptr<MCResourceInfo> RI;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need unique_ptr instead of just a plain member?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires a class inherited member for its constructor (in this case, OutContext).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But OutContext is a reference in the parent already, so you can use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only was I was able to do so was by explicitly adding the MCContext as argument for every public method in AMDGPUResourceInfo. Let me know if there's a way I missed without being so verbose with MCContext arguments in AMDGPUResourceInfo.

@@ -3025,8 +3025,8 @@ define amdgpu_kernel void @dyn_extract_v5f64_s_s(ptr addrspace(1) %out, i32 %sel
; GPRIDX-NEXT: amd_machine_version_stepping = 0
; GPRIDX-NEXT: kernel_code_entry_byte_offset = 256
; GPRIDX-NEXT: kernel_code_prefetch_byte_size = 0
; GPRIDX-NEXT: granulated_workitem_vgpr_count = 0
; GPRIDX-NEXT: granulated_wavefront_sgpr_count = 1
; GPRIDX-NEXT: granulated_workitem_vgpr_count = (11468800|(((((alignto(max(max(totalnumvgprs(dyn_extract_v5f64_s_s.num_agpr, dyn_extract_v5f64_s_s.num_vgpr), 1, 0), 1), 4))/4)-1)&63)|(((((alignto(max(max(dyn_extract_v5f64_s_s.num_sgpr+(extrasgprs(dyn_extract_v5f64_s_s.uses_vcc, dyn_extract_v5f64_s_s.uses_flat_scratch, 1)), 1, 0), 1), 8))/8)-1)&15)<<6)))&63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned into a mess for what should be a simple function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of the the sub-expressions of a MCExpr is unknown/unresolvable at this point in time (i.e., asm printing for a particular MachineFunction) it will print out the MCExpr in its most verbose way possible. It doesn't help that both the granulated_workitem_vgpr_count and granulated_wavefront_sgpr_count are basically the same MCExpr, but masked for the only the relevant bits (i.e., compute_pgm_resource1_registers masked for whatever we want to retrieve).

I was thinking of explicitly splitting all of the components that compose any of the compute_pgm_resourceX registers into their own MCExpr and leave computation of the compute_pgm_resourceX register for when they're used/necessary. However, this wouldn't help resolving the unknowns/unresolvables at the time of printing the amd_kernel_code_t metadata. Do let me know if splitting the composed registers is still desired, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why wasn't this resolvable? This function has no calls or anything complicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I had to move the AMDGPUResourceInfo gathering function prior to amdhsa kernel descriptor asm printer to make deduce the values.

; DEFAULTSIZE: ; ScratchSize: 16

; ASSUME1024: .amdhsa_private_segment_fixed_size 1040
; ASSUME1024: .amdhsa_private_segment_fixed_size kernel_non_entry_block_static_alloca_uniformly_reached_align4.private_seg_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this print as a simple size anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a function's private_segment_size depends on itself, and all of its callees' propagated private_segment_size it may be the case that any of the callees' private_segment_size has yet to be analysed through AMDGPUResourceUsageAnalysis.
For example:

void foo() {
  ...
  call bar
  ...
}

void bar() {
  ...
  ...
}

Where for foo, we know and can construct the calculation for its private_segment_size as

[foo's own private segment required size] + max(bar.private_segment_size, [any other of foo's called function's private_segment_size])

But because we are currently printing foo's MachineFunction in AMDGPUAsmPrinter, we haven't analysed bar's private_segment_size yet. This will eventually be known but as we need to print metadata for foo, it's already printed with bar's private_segment_size placeholder.

TL;DR: cannot compute constant value yet, need to print symbolic representation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was more this specific test case does not have calls and should be trivially resolved

@jhuber6
Copy link
Contributor

jhuber6 commented Aug 16, 2024

I applied this locally and it resolved #64863 so I'm looking forward to this landing.

Copy link
Contributor

@slinder1 slinder1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with some nits

The expressions look pretty reasonable to me, although I didn't really go through validating that they are correct

MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
switch (RIK) {
case RIK_NumVGPR:
return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these names guaranteed to be unique? Do we reserve any symbol with a '.' in it?

Copy link
Contributor Author

@JanekvO JanekvO Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these names guaranteed to be unique?

From what I could tell, function names have to be unique at this point (e.g., name mangling already happened)

Do we reserve any symbol with a '.' in it?

I'll have a look at this, I didn't see any reserved symbols while I was working on this but I haven't explicitly searched for it either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow up on the '.' symbols: there's a few cases of symbols including dots like local symbols getting prefixed with .L or some possibly emitted symbols like .note.GNU-stack but the dots don't seem to be exclusive. I can, however, use another delimiter (or remove delimiter altogether) if the dot might be something we'd like to keep for other cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names seem fine to me, but we should write down which symbols are "reserved" or "well known" or however we want to define it relative to how they are intended to be used. Can you document them in AMDGPUUsage maybe?

My original question was more around how a user could technically contrive a conflict with an __asm__(("foo.num_vgpr")) definition, which is outlandish and easy enough for us to say "don't do that". I just want to make sure we write down the spirit of "don't do that", in a similar way to how C/C++ (I believe?) define symbols starting with "__" as reserved for implementation use.

MCContext &OutContext) {
const MCConstantExpr *LocalConstExpr =
MCConstantExpr::create(LocalValue, OutContext);
const MCExpr *SymVal = LocalConstExpr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we don't take advantage of indirect calls with a known set of callees

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, finding the module level maximum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can know when an indirect call has a known possible range of callees. Whether from !callees metadata or from seeing the possible values of the call target

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to look into this a bit as I wasn't aware of the !callees metadata: but yes, we currently don't take advantage of this. I was able to create an example that emits the !callees metadata for an indirect call but it will always fall back on the worst case (i.e., module level worst case values).

@@ -3,6 +3,18 @@

declare i32 @llvm.amdgcn.workitem.id.x()

define <2 x i64> @f1() #0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated function appeared?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some function order changes in some tests. This function is the same one as the function that is removed a couple of lines below.

@@ -0,0 +1,533 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s

; SGPR use may not seem equal to the sgpr use provided in comments as the latter includes extra sgprs (e.g., for vcc use).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either fix the naming of these fields to say it's only the numbered SGPRs, or just always include the + vcc part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied the following renames:
the function level number of sgpr symbol went from <function name>.num_sgpr to <function name>.numbered_sgpr
remark went from SGPRs to TotalSGPRs
comment went from NumSgprs to TotalNumSgprs

}

; GCN-LABEL: {{^}}indirect_use_vcc:
; GCN: .set indirect_use_vcc.num_vgpr, max(41, use_vcc.num_vgpr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you reassemble the output, do the resources always resolve to constants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to go from function-resource-usage.ll to funcion-resource-usage.s and try to assemble function-resource-usage.s through llvm-mc, it would resolve whatever it can at the time of emitting again. In the specific case you've highlighted it would resolve into a constant as use_vcc.num_vgpr is defined before its use.

However, cases which use/depend on the module level maximum (e.g., max_num_vgpr) wouldn't be able to resolve to a constant as these module level maximums are defined at the bottom of the file.

TL;DR: order of symbol define/use dependent

@JanekvO JanekvO requested review from arsenm and slinder1 August 27, 2024 19:05
Comment on lines 80 to 90
MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) {
return OutContext.getOrCreateSymbol("max_num_vgpr");
}

MCSymbol *MCResourceInfo::getMaxAGPRSymbol(MCContext &OutContext) {
return OutContext.getOrCreateSymbol("max_num_agpr");
}

MCSymbol *MCResourceInfo::getMaxSGPRSymbol(MCContext &OutContext) {
return OutContext.getOrCreateSymbol("max_num_sgpr");
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't convinced myself of the naming/approach I've used here. These are module scope maxima but every module is going to re-defined these for their own scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these should be placed in a custom section. In any case, we will eventually need custom linker logic to deal with this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put these in their own section for now, albeit without any module-unique identifiers (.AMDGPU.gpr_maximums).

@JanekvO
Copy link
Contributor Author

JanekvO commented Sep 6, 2024

Ping

Comment on lines 53 to 55
// validateMCResourceInfo cannot recompute parts of the occupancy as it does
// for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
// really want to track and validate the occupancy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this is the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought I needed access to the MachineFunction for SIMachineFunctionInfo but I can reconstruct it using just the Function and GCNSubTarget. It now recomputes the occupancy instead of this caching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do need the MachineFunction to get its SIMachineFunctionInfo. Constructing a new one won't give the same result?

Comment on lines 80 to 90
MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) {
return OutContext.getOrCreateSymbol("max_num_vgpr");
}

MCSymbol *MCResourceInfo::getMaxAGPRSymbol(MCContext &OutContext) {
return OutContext.getOrCreateSymbol("max_num_agpr");
}

MCSymbol *MCResourceInfo::getMaxSGPRSymbol(MCContext &OutContext) {
return OutContext.getOrCreateSymbol("max_num_sgpr");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these should be placed in a custom section. In any case, we will eventually need custom linker logic to deal with this

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need more thought about how the ABI for this will work, but we need to start somewhere

@JanekvO JanekvO force-pushed the resource-pass-convert branch from 65b98a8 to 9aa2721 Compare September 25, 2024 16:56
@JanekvO
Copy link
Contributor Author

JanekvO commented Sep 25, 2024

Rebase

@JanekvO JanekvO force-pushed the resource-pass-convert branch from 9aa2721 to 8cfbc1b Compare September 26, 2024 15:56
@JanekvO
Copy link
Contributor Author

JanekvO commented Sep 26, 2024

Rebase, hopefully pull in fixes for unrelated test failures

@JanekvO JanekvO merged commit c897c13 into llvm:main Sep 30, 2024
9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 30, 2024

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-shared running on bolt-worker while building clang,llvm at step 6 "test-build-bolt-check-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/151/builds/2543

Here is the relevant piece of the build log for the reference
Step 6 (test-build-bolt-check-bolt) failure: test (failure)
******************** TEST 'BOLT :: perf2bolt/perf_test.test' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 5: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
RUN: at line 6: perf record -Fmax -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ perf record -Fmax -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
info: Using a maximum frequency rate of 2000 Hz
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.004 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 (44 samples) ]
RUN: at line 7: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id 2>&1 | /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
RUN: at line 12: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -no-pie -fuse-ld=lld -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -no-pie -fuse-ld=lld -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
RUN: at line 13: perf record -Fmax -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
+ perf record -Fmax -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
info: Using a maximum frequency rate of 2000 Hz
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 (9 samples) ]
RUN: at line 14: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4 -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp6 -nl -ignore-build-id 2>&1 | /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4 -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp6 -nl -ignore-build-id
/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test:10:12: error: CHECK-NOT: excluded string found in input
CHECK-NOT: !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection.
           ^
<stdin>:26:2: note: found here
 !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance.
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        .
        .
        .
       21: BOLT-WARNING: Running parallel work of 0 estimated cost, will switch to trivial scheduling. 
       22: PERF2BOLT: processing basic events (without LBR)... 
       23: PERF2BOLT: read 9 samples 
       24: PERF2BOLT: out of range samples recorded in unknown regions: 9 (100.0%) 
       25:  
       26:  !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance. 
not:10      !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                   error: no match expected
       27:  
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 30, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-android running on sanitizer-buildbot-android while building clang,llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/2796

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[4577/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PthreadLockChecker.cpp.o
[4578/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PointerSortingChecker.cpp.o
[4579/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PointerSubChecker.cpp.o
[4580/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RetainCountChecker/RetainCountChecker.cpp.o
[4581/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnUndefChecker.cpp.o
[4582/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RetainCountChecker/RetainCountDiagnostics.cpp.o
[4583/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnPointerRangeChecker.cpp.o
[4584/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnValueChecker.cpp.o
[4585/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RunLoopAutoreleaseLeakChecker.cpp.o
[4586/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/include -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
[4587/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SmartPtrChecker.cpp.o
[4588/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SetgidSetuidOrderChecker.cpp.o
[4589/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SimpleStreamChecker.cpp.o
[4590/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SmartPtrModeling.cpp.o
[4591/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/StackAddrEscapeChecker.cpp.o
[4592/5265] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
[4593/5265] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUTargetStreamer.cpp.o
[4594/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[4595/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[4596/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[4597/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[4598/5265] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCTargetDesc.cpp.o
[4599/5265] Building CXX object lib/Target/AMDGPU/AsmParser/CMakeFiles/LLVMAMDGPUAsmParser.dir/AMDGPUAsmParser.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

@@@STEP_FAILURE@@@
Step 8 (bootstrap clang) failure: bootstrap clang (failure)
...
[4577/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PthreadLockChecker.cpp.o
[4578/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PointerSortingChecker.cpp.o
[4579/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PointerSubChecker.cpp.o
[4580/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RetainCountChecker/RetainCountChecker.cpp.o
[4581/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnUndefChecker.cpp.o
[4582/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RetainCountChecker/RetainCountDiagnostics.cpp.o
[4583/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnPointerRangeChecker.cpp.o
[4584/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnValueChecker.cpp.o
[4585/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RunLoopAutoreleaseLeakChecker.cpp.o
[4586/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/include -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
[4587/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SmartPtrChecker.cpp.o
[4588/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SetgidSetuidOrderChecker.cpp.o
[4589/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SimpleStreamChecker.cpp.o
[4590/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SmartPtrModeling.cpp.o
[4591/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/StackAddrEscapeChecker.cpp.o
[4592/5265] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
[4593/5265] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUTargetStreamer.cpp.o
[4594/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[4595/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[4596/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[4597/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[4598/5265] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCTargetDesc.cpp.o
[4599/5265] Building CXX object lib/Target/AMDGPU/AsmParser/CMakeFiles/LLVMAMDGPUAsmParser.dir/AMDGPUAsmParser.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
program finished with exit code 2
elapsedTime=287.340031

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 30, 2024

LLVM Buildbot has detected a new failure on builder ppc64le-lld-multistage-test running on ppc64le-lld-multistage-test while building clang,llvm at step 12 "build-stage2-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/3916

Here is the relevant piece of the build log for the reference
Step 12 (build-stage2-unified-tree) failure: build (failure)
...
588.747 [150/502/5621] Building CXX object unittests/DebugInfo/DWARF/CMakeFiles/DebugInfoDWARFTests.dir/DWARFLocationExpressionTest.cpp.o
588.749 [150/501/5622] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/BugDriver.cpp.o
588.757 [150/500/5623] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/ICF.cpp.o
588.775 [150/499/5624] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAliasAnalysis.cpp.o
588.780 [150/498/5625] Building CXX object unittests/MC/CMakeFiles/MCTests.dir/MCInstPrinter.cpp.o
588.814 [150/497/5626] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInstructions.cpp.o
588.987 [150/496/5627] Building CXX object tools/llvm-objdump/CMakeFiles/llvm-objdump.dir/XCOFFDump.cpp.o
589.004 [150/495/5628] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInvokes.cpp.o
589.245 [150/494/5629] Building CXX object unittests/CGData/CMakeFiles/CGDataTests.dir/OutlinedHashTreeRecordTest.cpp.o
589.273 [150/493/5630] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
ccache /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/install/stage1/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
589.281 [150/492/5631] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUELFObjectWriter.cpp.o
589.360 [150/491/5632] Building CXX object tools/lld/ELF/CMakeFiles/lldELF.dir/DriverUtils.cpp.o
589.605 [150/490/5633] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/ExecutionDriver.cpp.o
589.745 [150/489/5634] Building CXX object unittests/MC/AMDGPU/CMakeFiles/AMDGPUMCTests.dir/DwarfRegMappings.cpp.o
589.875 [150/488/5635] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibFunc.cpp.o
589.877 [150/487/5636] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/CodeRegionGenerator.cpp.o
589.922 [150/486/5637] Building C object tools/clang/tools/clang-fuzzer/dictionary/CMakeFiles/clang-fuzzer-dictionary.dir/dictionary.c.o
590.154 [150/485/5638] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/CallGraphSort.cpp.o
590.167 [150/484/5639] Building CXX object tools/clang/tools/clang-offload-packager/CMakeFiles/clang-offload-packager.dir/ClangOffloadPackager.cpp.o
590.377 [150/483/5640] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/BottleneckAnalysis.cpp.o
590.476 [150/482/5641] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceAttributes.cpp.o
590.721 [150/481/5642] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInstructionFlags.cpp.o
590.863 [150/480/5643] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/LowLevelTypeTest.cpp.o
590.874 [150/479/5644] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/ICF.cpp.o
590.906 [150/478/5645] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/OutputSegment.cpp.o
591.052 [150/477/5646] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/AMDGPUMetadataTest.cpp.o
591.055 [150/476/5647] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceDistinctMetadata.cpp.o
591.206 [150/475/5648] Building CXX object unittests/CodeGen/GlobalISel/CMakeFiles/GlobalISelTests.dir/GIMatchTableExecutorTest.cpp.o
591.353 [150/474/5649] Building CXX object tools/lld/ELF/CMakeFiles/lldELF.dir/Arch/X86.cpp.o
591.374 [150/473/5650] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
591.377 [150/472/5651] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceBasicBlocks.cpp.o
591.385 [150/471/5652] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/InputSection.cpp.o
591.455 [150/470/5653] Building CXX object unittests/DebugInfo/DWARF/CMakeFiles/DebugInfoDWARFTests.dir/DWARFListTableTest.cpp.o
591.457 [150/469/5654] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/ToolRunner.cpp.o
591.465 [150/468/5655] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1gen_reproducer_main.cpp.o
591.474 [150/467/5656] Building CXX object tools/dsymutil/CMakeFiles/dsymutil.dir/BinaryHolder.cpp.o
591.660 [150/466/5657] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceOperands.cpp.o
591.662 [150/465/5658] Building CXX object unittests/CodeGen/GlobalISel/CMakeFiles/GlobalISelTests.dir/CallLowering.cpp.o
591.665 [150/464/5659] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/llvm-reduce.cpp.o
591.697 [150/463/5660] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDKernelCodeTUtils.cpp.o

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 30, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux running on sanitizer-buildbot2 while building clang,llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/4341

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[5187/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXIndexDataConsumer.cpp.o
[5188/5294] Building CXX object tools/clang/tools/clang-extdef-mapping/CMakeFiles/clang-extdef-mapping.dir/ClangExtDefMapGen.cpp.o
[5189/5294] Building CXX object tools/clang/tools/clang-scan-deps/CMakeFiles/clang-scan-deps.dir/ClangScanDeps.cpp.o
[5190/5294] Building CXX object tools/clang/tools/clang-check/CMakeFiles/clang-check.dir/ClangCheck.cpp.o
[5191/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexCodeCompletion.cpp.o
[5192/5294] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
[5193/5294] Building CXX object tools/clang/tools/c-index-test/CMakeFiles/c-index-test.dir/core_main.cpp.o
[5194/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5195/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/Indexing.cpp.o
[5196/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
[5197/5294] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUTargetStreamer.cpp.o
[5198/5294] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
[5199/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[5200/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[5201/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[5202/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[5203/5294] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCTargetDesc.cpp.o
[5204/5294] Building CXX object lib/Target/AMDGPU/AsmParser/CMakeFiles/LLVMAMDGPUAsmParser.dir/AMDGPUAsmParser.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

@@@STEP_FAILURE@@@
@@@BUILD_STEP test compiler-rt symbolizer@@@
ninja: Entering directory `build_default'
[1/40] Linking CXX static library lib/libLLVMAMDGPUUtils.a
[2/40] Linking CXX static library lib/libLLVMAMDGPUDesc.a
[3/40] Linking CXX static library lib/libLLVMAMDGPUDisassembler.a
[4/40] Linking CXX static library lib/libLLVMAMDGPUAsmParser.a
[5/40] Linking CXX executable bin/llvm-objdump
[6/40] Linking CXX executable bin/llvm-ar
[7/40] Linking CXX executable bin/sancov
[8/40] Generating ../../bin/llvm-ranlib
[9/40] Linking CXX executable bin/llvm-nm
[10/40] Linking CXX executable bin/llvm-jitlink
[11/40] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
Step 8 (build compiler-rt symbolizer) failure: build compiler-rt symbolizer (failure)
...
[5187/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXIndexDataConsumer.cpp.o
[5188/5294] Building CXX object tools/clang/tools/clang-extdef-mapping/CMakeFiles/clang-extdef-mapping.dir/ClangExtDefMapGen.cpp.o
[5189/5294] Building CXX object tools/clang/tools/clang-scan-deps/CMakeFiles/clang-scan-deps.dir/ClangScanDeps.cpp.o
[5190/5294] Building CXX object tools/clang/tools/clang-check/CMakeFiles/clang-check.dir/ClangCheck.cpp.o
[5191/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexCodeCompletion.cpp.o
[5192/5294] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
[5193/5294] Building CXX object tools/clang/tools/c-index-test/CMakeFiles/c-index-test.dir/core_main.cpp.o
[5194/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5195/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/Indexing.cpp.o
[5196/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
[5197/5294] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUTargetStreamer.cpp.o
[5198/5294] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
[5199/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[5200/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[5201/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[5202/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[5203/5294] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCTargetDesc.cpp.o
[5204/5294] Building CXX object lib/Target/AMDGPU/AsmParser/CMakeFiles/LLVMAMDGPUAsmParser.dir/AMDGPUAsmParser.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 9 (test compiler-rt symbolizer) failure: test compiler-rt symbolizer (failure)
...
[2/40] Linking CXX static library lib/libLLVMAMDGPUDesc.a
[3/40] Linking CXX static library lib/libLLVMAMDGPUDisassembler.a
[4/40] Linking CXX static library lib/libLLVMAMDGPUAsmParser.a
[5/40] Linking CXX executable bin/llvm-objdump
[6/40] Linking CXX executable bin/llvm-ar
[7/40] Linking CXX executable bin/sancov
[8/40] Generating ../../bin/llvm-ranlib
[9/40] Linking CXX executable bin/llvm-nm
[10/40] Linking CXX executable bin/llvm-jitlink
[11/40] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 10 (build compiler-rt debug) failure: build compiler-rt debug (failure)
...
[5218/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXSourceLocation.cpp.o
[5219/5294] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
[5220/5294] Building CXX object tools/clang/tools/c-index-test/CMakeFiles/c-index-test.dir/core_main.cpp.o
[5221/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexInclusionStack.cpp.o
[5222/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexUSRs.cpp.o
[5223/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/Indexing.cpp.o
[5224/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndex.cpp.o
[5225/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5226/5294] Building CXX object tools/clang/tools/clang-scan-deps/CMakeFiles/clang-scan-deps.dir/ClangScanDeps.cpp.o
[5227/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure)
@@@BUILD_STEP test compiler-rt debug@@@
ninja: Entering directory `build_default'
[1/30] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 12 (build compiler-rt tsan_debug) failure: build compiler-rt tsan_debug (failure)
...
[5199/5275] Generating ../../bin/llvm-ranlib
[5200/5275] Linking CXX executable bin/llvm-rtdyld
[5201/5275] Linking CXX executable bin/sancov
[5202/5275] Linking CXX executable bin/llvm-objdump
[5203/5275] Generating ../../bin/llvm-otool
[5204/5275] Linking CXX executable bin/llvm-nm
[5205/5275] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5206/5275] Linking CXX executable bin/llvm-jitlink
[5207/5275] Linking CXX executable bin/llvm-profgen
[5208/5275] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 13 (build compiler-rt default) failure: build compiler-rt default (failure)
...
[5218/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexHigh.cpp.o
[5219/5294] Building CXX object tools/clang/tools/clang-repl/CMakeFiles/clang-repl.dir/ClangRepl.cpp.o
[5220/5294] Building CXX object tools/clang/tools/c-index-test/CMakeFiles/c-index-test.dir/core_main.cpp.o
[5221/5294] Building CXX object tools/clang/tools/clang-scan-deps/CMakeFiles/clang-scan-deps.dir/ClangScanDeps.cpp.o
[5222/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndex.cpp.o
[5223/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXCursor.cpp.o
[5224/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXIndexDataConsumer.cpp.o
[5225/5294] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
[5226/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5227/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 14 (test compiler-rt default) failure: test compiler-rt default (failure)
@@@BUILD_STEP test compiler-rt default@@@
ninja: Entering directory `build_default'
[1/30] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 15 (build standalone compiler-rt) failure: build standalone compiler-rt (failure)
...
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.
Call Stack (most recent call first):
  CMakeLists.txt:12 (include)
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
-- The ASM compiler identification is unknown
-- Didn't find assembler
CMake Error at CMakeLists.txt:17 (project):
  The CMAKE_C_COMPILER:

    /home/b/sanitizer-x86_64-linux/build/build_default/bin/clang

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
  the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:17 (project):
  The CMAKE_CXX_COMPILER:

    /home/b/sanitizer-x86_64-linux/build/build_default/bin/clang++

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:17 (project):
  No CMAKE_ASM_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "ASM" or the CMake cache entry CMAKE_ASM_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.
-- Warning: Did not find file Compiler/-ASM
-- Configuring incomplete, errors occurred!

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
ninja: Entering directory `compiler_rt_build'
ninja: error: loading 'build.ninja': No such file or directory

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 30, 2024

LLVM Buildbot has detected a new failure on builder clang-ppc64le-rhel running on ppc64le-clang-rhel-test while building clang,llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/145/builds/2167

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
424.444 [204/139/5919] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/BottleneckAnalysis.cpp.o
424.489 [204/138/5920] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceGlobalVarInitializers.cpp.o
424.523 [204/137/5921] Building CXX object tools/llvm-ml/CMakeFiles/llvm-ml.dir/Disassembler.cpp.o
424.532 [204/136/5922] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/Symbols.cpp.o
424.545 [204/135/5923] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/SimplifyInstructions.cpp.o
424.596 [204/134/5924] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/ToolRunner.cpp.o
424.618 [204/133/5925] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUOpenCLEnqueuedBlockLowering.cpp.o
424.792 [204/132/5926] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/Relocations.cpp.o
424.923 [204/131/5927] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/COFFLinkerContext.cpp.o
424.967 [204/130/5928] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
ccache /home/docker/llvm-external-buildbots/clang.17.0.6/bin/clang++ --gcc-toolchain=/gcc-toolchain/usr -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fPIC  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
425.060 [204/129/5929] Building CXX object tools/llvm-objdump/CMakeFiles/llvm-objdump.dir/OffloadDump.cpp.o
425.073 [204/128/5930] Building CXX object tools/llvm-objdump/CMakeFiles/llvm-objdump.dir/XCOFFDump.cpp.o
425.201 [204/127/5931] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/BugDriver.cpp.o
425.404 [204/126/5932] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexer.cpp.o
425.460 [204/125/5933] Building CXX object tools/clang/tools/clang-offload-packager/CMakeFiles/clang-offload-packager.dir/ClangOffloadPackager.cpp.o
425.471 [204/124/5934] Building CXX object tools/lld/MinGW/CMakeFiles/lldMinGW.dir/Driver.cpp.o
425.552 [204/123/5935] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURewriteUndefForPHI.cpp.o
425.672 [204/122/5936] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAliasAnalysis.cpp.o
425.835 [204/121/5937] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/MarkLive.cpp.o
425.869 [204/120/5938] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/CodeRegionGenerator.cpp.o
425.916 [204/119/5939] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceGlobalVars.cpp.o
425.936 [204/118/5940] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/ExecutionDriver.cpp.o
426.050 [204/117/5941] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/Arch/ARM64.cpp.o
426.153 [204/116/5942] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibFunc.cpp.o
426.200 [204/115/5943] Building CXX object tools/dsymutil/CMakeFiles/dsymutil.dir/BinaryHolder.cpp.o
426.241 [204/114/5944] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInstructions.cpp.o
426.336 [204/113/5945] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/MinGW.cpp.o
426.352 [204/112/5946] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/ICF.cpp.o
426.434 [204/111/5947] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCtorDtorLowering.cpp.o
426.446 [204/110/5948] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/CallGraphSort.cpp.o
426.681 [204/109/5949] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/LLDMapFile.cpp.o
426.782 [204/108/5950] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInstructionFlags.cpp.o
426.845 [204/107/5951] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
427.074 [204/106/5952] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteKernelArguments.cpp.o
427.281 [204/105/5953] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/MapFile.cpp.o
427.283 [204/104/5954] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInvokes.cpp.o
427.368 [204/103/5955] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/ICF.cpp.o
427.412 [204/102/5956] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceDistinctMetadata.cpp.o
427.610 [204/101/5957] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/OptimizerDriver.cpp.o
427.621 [204/100/5958] Building CXX object tools/llvm-objdump/CMakeFiles/llvm-objdump.dir/COFFDump.cpp.o

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 30, 2024

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building clang,llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/1275

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/35/87' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-19660-35-87.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=87 GTEST_SHARD_INDEX=35 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************


searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Oct 2, 2024
…ass (llvm#102913)

Reland:

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction
pass. Moves function resource info propagation to to MC layer (through
helpers in AMDGPUMCResourceInfo) by generating MCExprs for every
function resource which the emitters have been prepped for.

Fixes llvm#64863

Change-Id: I180c941a1535be646144960ff62e4cf24a5aa1da
abidh pushed a commit to abidh/llvm-project that referenced this pull request Feb 4, 2025
…ass (llvm#102913)

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction
pass. Moves function resource info propagation to to MC layer (through
helpers in AMDGPUMCResourceInfo) by generating MCExprs for every
function resource which the emitters have been prepped for.

Fixes llvm#64863

[AMDGPU] Fix stack size metadata for functions with direct and indirect calls (llvm#110828)

When a function has an external call, it should still use the stack
sizes of direct, known, calls to calculate its own stack size

[AMDGPU] Fix resource usage information for unnamed functions (llvm#115320)

Resource usage information would try to overwrite unnamed functions if
there are multiple within the same compilation unit. This aims to either
use the `MCSymbol` assigned to the unnamed function (i.e.,
`CurrentFnSym`), or, rematerialize the `MCSymbol` for the unnamed
function.

Reapply [AMDGPU] Avoid resource propagation for recursion through multiple functions (llvm#112251)

I was wrong last patch. I viewed the `Visited` set purely as a possible
recursion deterrent where functions calling a callee multiple times are
handled elsewhere. This doesn't consider cases where a function is
called multiple times by different callers still part of the same call
graph. New test shows the aforementioned case.

Reapplies llvm#111004, fixes llvm#115562.

[AMDGPU] Newly added test modified for recent SGPR use change (llvm#116427)

Mistimed rebase for llvm#112251 which added new tests which did not consider
the changes introduced in llvm#112403 yet

Change-Id: I4dfe6a1b679137e080a6d2b44016347ea704b014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang Clang issues not falling into any other category llvm:globalisel mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AMDGPU] Backend code generation fails on internal function at O0
6 participants