[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass #102913

JanekvO · 2024-08-12T14:44:39Z

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.

Function resource info (finalized) validation added through ValidateMCResourceInfo. Previously (and currently, still) done in getSIProgramInfo. However, MCExprs may not be resolvable during getSIProgramInfo due to propagation of yet-to-be-defined function resource info symbols. Does require some caching of the generated occupancy MCExpr computation in getSIProgramInfo to validate against.
Unfortunately, requires module specific max_num_Xgprs symbols to be emitted. Cannot generate a separate MCExpr for finding the max without getting recursive symbol defines.
Most test changes are caused by the representation changes from constant/known values to MCExprs.

Fixes #64863

llvmbot · 2024-08-12T14:45:11Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang

Author: Janek van Oirschot (JanekvO)

Changes

!!! Stacked PR on top of #95951 commit, please only review the latest commit 51f72f115b340a092c2c9f8569911b944a4efb6d !!!!

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.

Function resource info (finalized) validation added through ValidateMCResourceInfo. Previously (and currently, still) done in getSIProgramInfo. However, MCExprs may not be resolvable during getSIProgramInfo due to propagation of yet-to-be-defined function resource info symbols. Does require some caching of the generated occupancy MCExpr computation in getSIProgramInfo to validate against.
Unfortunately, requires module specific max_num_Xgprs symbols to be emitted. Cannot generate a separate MCExpr for finding the max without getting recursive symbol defines.
Most test changes are caused by the representation changes from constant/known values to MCExprs.

Patch is 369.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102913.diff

66 Files Affected:

(modified) clang/test/Frontend/amdgcn-machine-analysis-remarks.cl (+5-5)
(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+176-42)
(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h (+14-5)
(added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp (+220)
(added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h (+94)
(modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp (+22-125)
(modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.h (+12-28)
(modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.cpp (+359)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.h (+11)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+55-14)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h (+23)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUPALMetadata.cpp (+5-5)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp (+20-26)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.h (+4-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll (+84-84)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll (+9-7)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.workitem.id.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/agpr-register-count.ll (+37-28)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/amdpal-metadata-agpr-register-count.ll (+3-1)
(modified) llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll (+28-23)
(modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+26-4)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+10-6)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+6-2)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+9-2)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+9-2)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+9-2)
(modified) llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll (+39-11)
(modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/cndmask-no-def-vcc.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/code-object-v3.ll (+24-10)
(modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll (+4-2)
(modified) llvm/test/CodeGen/AMDGPU/elf.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/enable-scratch-only-dynamic-stack.ll (+9-5)
(added) llvm/test/CodeGen/AMDGPU/function-resource-usage.ll (+533)
(modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+8-8)
(modified) llvm/test/CodeGen/AMDGPU/inline-asm-reserved-regs.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+12-12)
(modified) llvm/test/CodeGen/AMDGPU/ipra.ll (+7-7)
(modified) llvm/test/CodeGen/AMDGPU/kernarg-size.ll (+3-1)
(modified) llvm/test/CodeGen/AMDGPU/kernel_code_t_recurse.ll (+6-1)
(modified) llvm/test/CodeGen/AMDGPU/large-alloca-compute.ll (+23-8)
(modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+15-30)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll (+13-13)
(modified) llvm/test/CodeGen/AMDGPU/mesa3d.ll (+8-5)
(modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+49-50)
(modified) llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll (+12-8)
(modified) llvm/test/CodeGen/AMDGPU/recursion.ll (+18-10)
(modified) llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll (+32-32)
(modified) llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll (+7-6)
(modified) llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll (+90-36)
(modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/trap.ll (+21-12)
(modified) llvm/test/MC/AMDGPU/amd_kernel_code_t.s (+14-22)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s (+32-32)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s (+30-30)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s (+29-29)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s (+27-27)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s (+27-27)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s (+23-23)

diff --git a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
index a05e21b37b9127..a2dd59a871904c 100644
--- a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
+++ b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
@@ -2,12 +2,12 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx908 -Rpass-analysis=kernel-resource-usage -S -O0 -verify %s -o /dev/null
 
 // expected-remark@+10 {{Function Name: foo}}
-// expected-remark@+9 {{    SGPRs: 13}}
-// expected-remark@+8 {{    VGPRs: 10}}
-// expected-remark@+7 {{    AGPRs: 12}}
-// expected-remark@+6 {{    ScratchSize [bytes/lane]: 0}}
+// expected-remark@+9 {{    SGPRs: foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1))}}
+// expected-remark@+8 {{    VGPRs: foo.num_vgpr}}
+// expected-remark@+7 {{    AGPRs: foo.num_agpr}}
+// expected-remark@+6 {{    ScratchSize [bytes/lane]: foo.private_seg_size}}
 // expected-remark@+5 {{    Dynamic Stack: False}}
-// expected-remark@+4 {{    Occupancy [waves/SIMD]: 10}}
+// expected-remark@+4 {{    Occupancy [waves/SIMD]: occupancy(10, 4, 256, 8, 10, max(foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1)), 1, 0), max(totalnumvgprs(foo.num_agpr, foo.num_vgpr), 1, 0))}}
 // expected-remark@+3 {{    SGPRs Spill: 0}}
 // expected-remark@+2 {{    VGPRs Spill: 0}}
 // expected-remark@+1 {{    LDS Size [bytes/block]: 0}}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index e64e28e01d3d18..97a5cb29d51023 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -18,6 +18,7 @@
 #include "AMDGPUAsmPrinter.h"
 #include "AMDGPU.h"
 #include "AMDGPUHSAMetadataStreamer.h"
+#include "AMDGPUMCResourceInfo.h"
 #include "AMDGPUResourceUsageAnalysis.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUInstPrinter.h"
@@ -92,6 +93,9 @@ AMDGPUAsmPrinter::AMDGPUAsmPrinter(TargetMachine &TM,
                                    std::unique_ptr<MCStreamer> Streamer)
     : AsmPrinter(TM, std::move(Streamer)) {
   assert(OutStreamer && "AsmPrinter constructed without streamer");
+  RI = std::make_unique<MCResourceInfo>(OutContext);
+  OccupancyValidateMap =
+      std::make_unique<DenseMap<const Function *, const MCExpr *>>();
 }
 
 StringRef AMDGPUAsmPrinter::getPassName() const {
@@ -359,6 +363,102 @@ bool AMDGPUAsmPrinter::doInitialization(Module &M) {
   return AsmPrinter::doInitialization(M);
 }
 
+void AMDGPUAsmPrinter::ValidateMCResourceInfo(Function &F) {
+  if (F.isDeclaration() || !AMDGPU::isModuleEntryFunctionCC(F.getCallingConv()))
+    return;
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  const GCNSubtarget &STM = TM.getSubtarget<GCNSubtarget>(F);
+
+  auto TryGetMCExprValue = [](const MCExpr *Value, uint64_t &Res) -> bool {
+    int64_t Val;
+    if (Value->evaluateAsAbsolute(Val)) {
+      Res = Val;
+      return true;
+    }
+    return false;
+  };
+
+  const uint64_t MaxScratchPerWorkitem =
+      STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
+  MCSymbol *ScratchSizeSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_PrivateSegSize);
+  uint64_t ScratchSize;
+  if (ScratchSizeSymbol->isVariable() &&
+      TryGetMCExprValue(ScratchSizeSymbol->getVariableValue(), ScratchSize) &&
+      ScratchSize > MaxScratchPerWorkitem) {
+    DiagnosticInfoStackSize DiagStackSize(F, ScratchSize, MaxScratchPerWorkitem,
+                                          DS_Error);
+    F.getContext().diagnose(DiagStackSize);
+  }
+
+  // Validate addressable scalar registers (i.e., prior to added implicit
+  // SGPRs).
+  MCSymbol *NumSGPRSymbol = RI->getSymbol(F.getName(), RIK::RIK_NumSGPR);
+  if (STM.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS &&
+      !STM.hasSGPRInitBug()) {
+    unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+    uint64_t NumSgpr;
+    if (NumSGPRSymbol->isVariable() &&
+        TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+        NumSgpr > MaxAddressableNumSGPRs) {
+      DiagnosticInfoResourceLimit Diag(F, "addressable scalar registers",
+                                       NumSgpr, MaxAddressableNumSGPRs,
+                                       DS_Error, DK_ResourceLimit);
+      F.getContext().diagnose(Diag);
+      return;
+    }
+  }
+
+  MCSymbol *VCCUsedSymbol = RI->getSymbol(F.getName(), RIK::RIK_UsesVCC);
+  MCSymbol *FlatUsedSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_UsesFlatScratch);
+  uint64_t VCCUsed, FlatUsed, NumSgpr;
+
+  if (NumSGPRSymbol->isVariable() && VCCUsedSymbol->isVariable() &&
+      FlatUsedSymbol->isVariable() &&
+      TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+      TryGetMCExprValue(VCCUsedSymbol->getVariableValue(), VCCUsed) &&
+      TryGetMCExprValue(FlatUsedSymbol->getVariableValue(), FlatUsed)) {
+
+    // Recomputes NumSgprs + implicit SGPRs but all symbols should now be
+    // resolvable.
+    NumSgpr += IsaInfo::getNumExtraSGPRs(
+        &STM, VCCUsed, FlatUsed,
+        getTargetStreamer()->getTargetID()->isXnackOnOrAny());
+    if (STM.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ||
+        STM.hasSGPRInitBug()) {
+      unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+      if (NumSgpr > MaxAddressableNumSGPRs) {
+        DiagnosticInfoResourceLimit Diag(F, "scalar registers", NumSgpr,
+                                         MaxAddressableNumSGPRs, DS_Error,
+                                         DK_ResourceLimit);
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+
+    auto I = OccupancyValidateMap->find(&F);
+    if (I != OccupancyValidateMap->end()) {
+      const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+          F, "amdgpu-waves-per-eu", {0, 0}, true);
+      uint64_t Occupancy;
+      const MCExpr *OccupancyExpr = I->getSecond();
+
+      if (TryGetMCExprValue(OccupancyExpr, Occupancy) && Occupancy < MinWEU) {
+        DiagnosticInfoOptimizationFailure Diag(
+            F, F.getSubprogram(),
+            "failed to meet occupancy target given by 'amdgpu-waves-per-eu' in "
+            "'" +
+                F.getName() + "': desired occupancy was " + Twine(MinWEU) +
+                ", final occupancy is " + Twine(Occupancy));
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+  }
+}
+
 bool AMDGPUAsmPrinter::doFinalization(Module &M) {
   // Pad with s_code_end to help tools and guard against instruction prefetch
   // causing stale data in caches. Arguably this should be done by the linker,
@@ -371,39 +471,29 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
     getTargetStreamer()->EmitCodeEnd(STI);
   }
 
-  return AsmPrinter::doFinalization(M);
-}
+  // Assign expressions which can only be resolved when all other functions are
+  // known.
+  RI->Finalize();
+  getTargetStreamer()->EmitMCResourceMaximums(
+      RI->getMaxVGPRSymbol(), RI->getMaxAGPRSymbol(), RI->getMaxSGPRSymbol());
 
-// Print comments that apply to both callable functions and entry points.
-void AMDGPUAsmPrinter::emitCommonFunctionComments(
-    uint32_t NumVGPR, std::optional<uint32_t> NumAGPR, uint32_t TotalNumVGPR,
-    uint32_t NumSGPR, uint64_t ScratchSize, uint64_t CodeSize,
-    const AMDGPUMachineFunction *MFI) {
-  OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);
-  OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);
-  OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);
-  if (NumAGPR) {
-    OutStreamer->emitRawComment(" NumAgprs: " + Twine(*NumAGPR), false);
-    OutStreamer->emitRawComment(" TotalNumVgprs: " + Twine(TotalNumVGPR),
-                                false);
+  for (Function &F : M.functions()) {
+    ValidateMCResourceInfo(F);
   }
-  OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);
-  OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),
-                              false);
+  return AsmPrinter::doFinalization(M);
 }
 
 SmallString<128> AMDGPUAsmPrinter::getMCExprStr(const MCExpr *Value) {
   SmallString<128> Str;
   raw_svector_ostream OSS(Str);
-  int64_t IVal;
-  if (Value->evaluateAsAbsolute(IVal)) {
-    OSS << static_cast<uint64_t>(IVal);
-  } else {
-    Value->print(OSS, MAI);
-  }
+  auto &Streamer = getTargetStreamer()->getStreamer();
+  auto &Context = Streamer.getContext();
+  const MCExpr *New = llvm::TryFold(Value, Context);
+  AMDGPUMCExprPrint(New, OSS, MAI);
   return Str;
 }
 
+// Print comments that apply to both callable functions and entry points.
 void AMDGPUAsmPrinter::emitCommonFunctionComments(
     const MCExpr *NumVGPR, const MCExpr *NumAGPR, const MCExpr *TotalNumVGPR,
     const MCExpr *NumSGPR, const MCExpr *ScratchSize, uint64_t CodeSize,
@@ -573,21 +663,45 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
   emitResourceUsageRemarks(MF, CurrentProgramInfo, MFI->isModuleEntryFunction(),
                            STM.hasMAIInsts());
 
+  {
+    const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+        ResourceUsage->getResourceInfo();
+    RI->gatherResourceInfo(MF, Info);
+    using RIK = MCResourceInfo::ResourceInfoKind;
+    getTargetStreamer()->EmitMCResourceInfo(
+        RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumSGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesVCC),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesFlatScratch),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasDynSizedStack),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasRecursion),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasIndirectCall));
+  }
+
   if (isVerbose()) {
     MCSectionELF *CommentSection =
         Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);
     OutStreamer->switchSection(CommentSection);
 
     if (!MFI->isEntryFunction()) {
+      using RIK = MCResourceInfo::ResourceInfoKind;
       OutStreamer->emitRawComment(" Function info:", false);
-      const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-          ResourceUsage->getResourceInfo(&MF.getFunction());
+
       emitCommonFunctionComments(
-          Info.NumVGPR,
-          STM.hasMAIInsts() ? Info.NumAGPR : std::optional<uint32_t>(),
-          Info.getTotalNumVGPRs(STM),
-          Info.getTotalNumSGPRs(MF.getSubtarget<GCNSubtarget>()),
-          Info.PrivateSegmentSize, getFunctionCodeSize(MF), MFI);
+          RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR)->getVariableValue(),
+          STM.hasMAIInsts() ? RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR)
+                                  ->getVariableValue()
+                            : nullptr,
+          RI->createTotalNumVGPRs(MF, Ctx),
+          RI->createTotalNumSGPRs(
+              MF,
+              MF.getSubtarget<GCNSubtarget>().getTargetID().isXnackOnOrAny(),
+              Ctx),
+          RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize)
+              ->getVariableValue(),
+          getFunctionCodeSize(MF), MFI);
       return false;
     }
 
@@ -755,8 +869,6 @@ uint64_t AMDGPUAsmPrinter::getFunctionCodeSize(const MachineFunction &MF) const
 
 void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
                                         const MachineFunction &MF) {
-  const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-      ResourceUsage->getResourceInfo(&MF.getFunction());
   const GCNSubtarget &STM = MF.getSubtarget<GCNSubtarget>();
   MCContext &Ctx = MF.getContext();
 
@@ -773,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
     return false;
   };
 
-  ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR);
-  ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR);
-  ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM));
-  ProgInfo.AccumOffset =
-      CreateExpr(alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1);
+  auto GetSymRefExpr =
+      [&](MCResourceInfo::ResourceInfoKind RIK) -> const MCExpr * {
+    MCSymbol *Sym = RI->getSymbol(MF.getName(), RIK);
+    return MCSymbolRefExpr::create(Sym, Ctx);
+  };
+
+  const MCExpr *ConstFour = MCConstantExpr::create(4, Ctx);
+  const MCExpr *ConstOne = MCConstantExpr::create(1, Ctx);
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  ProgInfo.NumArchVGPR = GetSymRefExpr(RIK::RIK_NumVGPR);
+  ProgInfo.NumAccVGPR = GetSymRefExpr(RIK::RIK_NumAGPR);
+  ProgInfo.NumVGPR = AMDGPUMCExpr::createTotalNumVGPR(
+      ProgInfo.NumAccVGPR, ProgInfo.NumArchVGPR, Ctx);
+
+  // AccumOffset computed for the MCExpr equivalent of:
+  // alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1;
+  ProgInfo.AccumOffset = MCBinaryExpr::createSub(
+      MCBinaryExpr::createDiv(
+          AMDGPUMCExpr::createAlignTo(
+              AMDGPUMCExpr::createMax({ConstOne, ProgInfo.NumArchVGPR}, Ctx),
+              ConstFour, Ctx),
+          ConstFour, Ctx),
+      ConstOne, Ctx);
   ProgInfo.TgSplit = STM.isTgSplitEnabled();
-  ProgInfo.NumSGPR = CreateExpr(Info.NumExplicitSGPR);
-  ProgInfo.ScratchSize = CreateExpr(Info.PrivateSegmentSize);
-  ProgInfo.VCCUsed = CreateExpr(Info.UsesVCC);
-  ProgInfo.FlatUsed = CreateExpr(Info.UsesFlatScratch);
+  ProgInfo.NumSGPR = GetSymRefExpr(RIK::RIK_NumSGPR);
+  ProgInfo.ScratchSize = GetSymRefExpr(RIK::RIK_PrivateSegSize);
+  ProgInfo.VCCUsed = GetSymRefExpr(RIK::RIK_UsesVCC);
+  ProgInfo.FlatUsed = GetSymRefExpr(RIK::RIK_UsesFlatScratch);
   ProgInfo.DynamicCallStack =
-      CreateExpr(Info.HasDynamicallySizedStack || Info.HasRecursion);
+      MCBinaryExpr::createOr(GetSymRefExpr(RIK::RIK_HasDynSizedStack),
+                             GetSymRefExpr(RIK::RIK_HasRecursion), Ctx);
 
   const uint64_t MaxScratchPerWorkitem =
       STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
@@ -1084,6 +1216,8 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
       STM.computeOccupancy(F, ProgInfo.LDSSize), ProgInfo.NumSGPRsForWavesPerEU,
       ProgInfo.NumVGPRsForWavesPerEU, STM, Ctx);
 
+  OccupancyValidateMap->insert({&MF.getFunction(), ProgInfo.Occupancy});
+
   const auto [MinWEU, MaxWEU] =
       AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", {0, 0}, true);
   uint64_t Occupancy;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index f66bbde42ce278..676a4687ee2af7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -24,6 +24,7 @@ struct AMDGPUResourceUsageAnalysis;
 class AMDGPUTargetStreamer;
 class MCCodeEmitter;
 class MCOperand;
+class MCResourceInfo;
 
 namespace AMDGPU {
 struct MCKernelDescriptor;
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   AMDGPUResourceUsageAnalysis *ResourceUsage;
 
+  std::unique_ptr<MCResourceInfo> RI;
+
   SIProgramInfo CurrentProgramInfo;
 
   std::unique_ptr<AMDGPU::HSAMD::MetadataStreamer> HSAMetadataStream;
 
   MCCodeEmitter *DumpCodeInstEmitter = nullptr;
 
+  // ValidateMCResourceInfo cannot recompute parts of the occupancy as it does
+  // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
+  // really want to track and validate the occupancy.
+  std::unique_ptr<DenseMap<const Function *, const MCExpr *>>
+      OccupancyValidateMap;
+
   uint64_t getFunctionCodeSize(const MachineFunction &MF) const;
 
   void getSIProgramInfo(SIProgramInfo &Out, const MachineFunction &MF);
@@ -60,11 +69,6 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
   void EmitPALMetadata(const MachineFunction &MF,
                        const SIProgramInfo &KernelInfo);
   void emitPALFunctionMetadata(const MachineFunction &MF);
-  void emitCommonFunctionComments(uint32_t NumVGPR,
-                                  std::optional<uint32_t> NumAGPR,
-                                  uint32_t TotalNumVGPR, uint32_t NumSGPR,
-                                  uint64_t ScratchSize, uint64_t CodeSize,
-                                  const AMDGPUMachineFunction *MFI);
   void emitCommonFunctionComments(const MCExpr *NumVGPR, const MCExpr *NumAGPR,
                                   const MCExpr *TotalNumVGPR,
                                   const MCExpr *NumSGPR,
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   SmallString<128> getMCExprStr(const MCExpr *Value);
 
+  /// Attempts to replace the validation that is missed in getSIProgramInfo due
+  /// to MCExpr being unknown. Invoked during doFinalization such that the
+  /// MCResourceInfo symbols are known.
+  void ValidateMCResourceInfo(Function &F);
+
 public:
   explicit AMDGPUAsmPrinter(TargetMachine &TM,
                             std::unique_ptr<MCStreamer> Streamer);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
new file mode 100644
index 00000000000000..58383475b312c9
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info ----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
+  switch (RIK) {
+  case RIK_NumVGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));
+  case RIK_NumAGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_agpr"));
+  case RIK_NumSGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_sgpr"));
+  case RIK_PrivateSegSize:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".private_seg_size"));
+  case RIK_UsesVCC:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_vcc"));
+  case RIK_UsesFlatScratch:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_flat_scratch"));
+  case RIK_HasDynSizedStack:
+    return OutContext.getOrCreateSymbol(FuncName +
+                                        Twine(".has_dyn_sized_stack"));
+  case RIK_HasRecursion:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_recursion"));
+  case RIK_HasIndirectCall:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_indirect_call"));
+  }
+  llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+                                            ResourceInfoKind RIK,
+                                            MCContext &Ctx) {
+  return MCSymbolRefExpr::create(getSymbol(FuncName, RIK), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs() {
+  // Assign expression to get the max register use to the max_num_Xgpr symbol.
+  MCSymbol *MaxVGPRSym = getMaxVGPRSymbol();
+  MCSymbol *MaxAGPRSym = getMaxAGPRSymbol();
+  MCSymbol *MaxSGPRSym = getMaxSGPRSymbol();
+
+  auto assignMaxRegSym = [this](MCSymbol *Sym, int32_t RegCount) {
+    const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+    Sym->setVariableValue(MaxExpr);
+  };
+
+  assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+  assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+  assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::Finalize() {
+  assert(!finalized && "Cannot finalize ResourceInfo again.");
+  finalized = true;
+  assignMaxRegs();
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_sgpr");
+}
+
+void MCResourceInfo::assignResourceInfoExpr(
+   ...
[truncated]

llvmbot · 2024-08-12T14:45:11Z

@llvm/pr-subscribers-mc

Author: Janek van Oirschot (JanekvO)

Changes

!!! Stacked PR on top of #95951 commit, please only review the latest commit 51f72f115b340a092c2c9f8569911b944a4efb6d !!!!

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.

Function resource info (finalized) validation added through ValidateMCResourceInfo. Previously (and currently, still) done in getSIProgramInfo. However, MCExprs may not be resolvable during getSIProgramInfo due to propagation of yet-to-be-defined function resource info symbols. Does require some caching of the generated occupancy MCExpr computation in getSIProgramInfo to validate against.
Unfortunately, requires module specific max_num_Xgprs symbols to be emitted. Cannot generate a separate MCExpr for finding the max without getting recursive symbol defines.
Most test changes are caused by the representation changes from constant/known values to MCExprs.

Patch is 369.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102913.diff

66 Files Affected:

(modified) clang/test/Frontend/amdgcn-machine-analysis-remarks.cl (+5-5)
(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+176-42)
(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h (+14-5)
(added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp (+220)
(added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h (+94)
(modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp (+22-125)
(modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.h (+12-28)
(modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.cpp (+359)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.h (+11)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+55-14)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h (+23)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUPALMetadata.cpp (+5-5)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp (+20-26)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.h (+4-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll (+84-84)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll (+9-7)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.workitem.id.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/agpr-register-count.ll (+37-28)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/amdpal-metadata-agpr-register-count.ll (+3-1)
(modified) llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll (+28-23)
(modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+26-4)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+10-6)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+6-2)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+9-2)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+9-2)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+9-2)
(modified) llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll (+39-11)
(modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/cndmask-no-def-vcc.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/code-object-v3.ll (+24-10)
(modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll (+4-2)
(modified) llvm/test/CodeGen/AMDGPU/elf.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/enable-scratch-only-dynamic-stack.ll (+9-5)
(added) llvm/test/CodeGen/AMDGPU/function-resource-usage.ll (+533)
(modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+8-8)
(modified) llvm/test/CodeGen/AMDGPU/inline-asm-reserved-regs.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+12-12)
(modified) llvm/test/CodeGen/AMDGPU/ipra.ll (+7-7)
(modified) llvm/test/CodeGen/AMDGPU/kernarg-size.ll (+3-1)
(modified) llvm/test/CodeGen/AMDGPU/kernel_code_t_recurse.ll (+6-1)
(modified) llvm/test/CodeGen/AMDGPU/large-alloca-compute.ll (+23-8)
(modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+15-30)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll (+13-13)
(modified) llvm/test/CodeGen/AMDGPU/mesa3d.ll (+8-5)
(modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+49-50)
(modified) llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll (+12-8)
(modified) llvm/test/CodeGen/AMDGPU/recursion.ll (+18-10)
(modified) llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll (+32-32)
(modified) llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll (+7-6)
(modified) llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll (+90-36)
(modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/trap.ll (+21-12)
(modified) llvm/test/MC/AMDGPU/amd_kernel_code_t.s (+14-22)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s (+32-32)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s (+30-30)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s (+29-29)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s (+27-27)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s (+27-27)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s (+23-23)

diff --git a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
index a05e21b37b9127..a2dd59a871904c 100644
--- a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
+++ b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
@@ -2,12 +2,12 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx908 -Rpass-analysis=kernel-resource-usage -S -O0 -verify %s -o /dev/null
 
 // expected-remark@+10 {{Function Name: foo}}
-// expected-remark@+9 {{    SGPRs: 13}}
-// expected-remark@+8 {{    VGPRs: 10}}
-// expected-remark@+7 {{    AGPRs: 12}}
-// expected-remark@+6 {{    ScratchSize [bytes/lane]: 0}}
+// expected-remark@+9 {{    SGPRs: foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1))}}
+// expected-remark@+8 {{    VGPRs: foo.num_vgpr}}
+// expected-remark@+7 {{    AGPRs: foo.num_agpr}}
+// expected-remark@+6 {{    ScratchSize [bytes/lane]: foo.private_seg_size}}
 // expected-remark@+5 {{    Dynamic Stack: False}}
-// expected-remark@+4 {{    Occupancy [waves/SIMD]: 10}}
+// expected-remark@+4 {{    Occupancy [waves/SIMD]: occupancy(10, 4, 256, 8, 10, max(foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1)), 1, 0), max(totalnumvgprs(foo.num_agpr, foo.num_vgpr), 1, 0))}}
 // expected-remark@+3 {{    SGPRs Spill: 0}}
 // expected-remark@+2 {{    VGPRs Spill: 0}}
 // expected-remark@+1 {{    LDS Size [bytes/block]: 0}}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index e64e28e01d3d18..97a5cb29d51023 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -18,6 +18,7 @@
 #include "AMDGPUAsmPrinter.h"
 #include "AMDGPU.h"
 #include "AMDGPUHSAMetadataStreamer.h"
+#include "AMDGPUMCResourceInfo.h"
 #include "AMDGPUResourceUsageAnalysis.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUInstPrinter.h"
@@ -92,6 +93,9 @@ AMDGPUAsmPrinter::AMDGPUAsmPrinter(TargetMachine &TM,
                                    std::unique_ptr<MCStreamer> Streamer)
     : AsmPrinter(TM, std::move(Streamer)) {
   assert(OutStreamer && "AsmPrinter constructed without streamer");
+  RI = std::make_unique<MCResourceInfo>(OutContext);
+  OccupancyValidateMap =
+      std::make_unique<DenseMap<const Function *, const MCExpr *>>();
 }
 
 StringRef AMDGPUAsmPrinter::getPassName() const {
@@ -359,6 +363,102 @@ bool AMDGPUAsmPrinter::doInitialization(Module &M) {
   return AsmPrinter::doInitialization(M);
 }
 
+void AMDGPUAsmPrinter::ValidateMCResourceInfo(Function &F) {
+  if (F.isDeclaration() || !AMDGPU::isModuleEntryFunctionCC(F.getCallingConv()))
+    return;
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  const GCNSubtarget &STM = TM.getSubtarget<GCNSubtarget>(F);
+
+  auto TryGetMCExprValue = [](const MCExpr *Value, uint64_t &Res) -> bool {
+    int64_t Val;
+    if (Value->evaluateAsAbsolute(Val)) {
+      Res = Val;
+      return true;
+    }
+    return false;
+  };
+
+  const uint64_t MaxScratchPerWorkitem =
+      STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
+  MCSymbol *ScratchSizeSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_PrivateSegSize);
+  uint64_t ScratchSize;
+  if (ScratchSizeSymbol->isVariable() &&
+      TryGetMCExprValue(ScratchSizeSymbol->getVariableValue(), ScratchSize) &&
+      ScratchSize > MaxScratchPerWorkitem) {
+    DiagnosticInfoStackSize DiagStackSize(F, ScratchSize, MaxScratchPerWorkitem,
+                                          DS_Error);
+    F.getContext().diagnose(DiagStackSize);
+  }
+
+  // Validate addressable scalar registers (i.e., prior to added implicit
+  // SGPRs).
+  MCSymbol *NumSGPRSymbol = RI->getSymbol(F.getName(), RIK::RIK_NumSGPR);
+  if (STM.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS &&
+      !STM.hasSGPRInitBug()) {
+    unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+    uint64_t NumSgpr;
+    if (NumSGPRSymbol->isVariable() &&
+        TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+        NumSgpr > MaxAddressableNumSGPRs) {
+      DiagnosticInfoResourceLimit Diag(F, "addressable scalar registers",
+                                       NumSgpr, MaxAddressableNumSGPRs,
+                                       DS_Error, DK_ResourceLimit);
+      F.getContext().diagnose(Diag);
+      return;
+    }
+  }
+
+  MCSymbol *VCCUsedSymbol = RI->getSymbol(F.getName(), RIK::RIK_UsesVCC);
+  MCSymbol *FlatUsedSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_UsesFlatScratch);
+  uint64_t VCCUsed, FlatUsed, NumSgpr;
+
+  if (NumSGPRSymbol->isVariable() && VCCUsedSymbol->isVariable() &&
+      FlatUsedSymbol->isVariable() &&
+      TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+      TryGetMCExprValue(VCCUsedSymbol->getVariableValue(), VCCUsed) &&
+      TryGetMCExprValue(FlatUsedSymbol->getVariableValue(), FlatUsed)) {
+
+    // Recomputes NumSgprs + implicit SGPRs but all symbols should now be
+    // resolvable.
+    NumSgpr += IsaInfo::getNumExtraSGPRs(
+        &STM, VCCUsed, FlatUsed,
+        getTargetStreamer()->getTargetID()->isXnackOnOrAny());
+    if (STM.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ||
+        STM.hasSGPRInitBug()) {
+      unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+      if (NumSgpr > MaxAddressableNumSGPRs) {
+        DiagnosticInfoResourceLimit Diag(F, "scalar registers", NumSgpr,
+                                         MaxAddressableNumSGPRs, DS_Error,
+                                         DK_ResourceLimit);
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+
+    auto I = OccupancyValidateMap->find(&F);
+    if (I != OccupancyValidateMap->end()) {
+      const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+          F, "amdgpu-waves-per-eu", {0, 0}, true);
+      uint64_t Occupancy;
+      const MCExpr *OccupancyExpr = I->getSecond();
+
+      if (TryGetMCExprValue(OccupancyExpr, Occupancy) && Occupancy < MinWEU) {
+        DiagnosticInfoOptimizationFailure Diag(
+            F, F.getSubprogram(),
+            "failed to meet occupancy target given by 'amdgpu-waves-per-eu' in "
+            "'" +
+                F.getName() + "': desired occupancy was " + Twine(MinWEU) +
+                ", final occupancy is " + Twine(Occupancy));
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+  }
+}
+
 bool AMDGPUAsmPrinter::doFinalization(Module &M) {
   // Pad with s_code_end to help tools and guard against instruction prefetch
   // causing stale data in caches. Arguably this should be done by the linker,
@@ -371,39 +471,29 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
     getTargetStreamer()->EmitCodeEnd(STI);
   }
 
-  return AsmPrinter::doFinalization(M);
-}
+  // Assign expressions which can only be resolved when all other functions are
+  // known.
+  RI->Finalize();
+  getTargetStreamer()->EmitMCResourceMaximums(
+      RI->getMaxVGPRSymbol(), RI->getMaxAGPRSymbol(), RI->getMaxSGPRSymbol());
 
-// Print comments that apply to both callable functions and entry points.
-void AMDGPUAsmPrinter::emitCommonFunctionComments(
-    uint32_t NumVGPR, std::optional<uint32_t> NumAGPR, uint32_t TotalNumVGPR,
-    uint32_t NumSGPR, uint64_t ScratchSize, uint64_t CodeSize,
-    const AMDGPUMachineFunction *MFI) {
-  OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);
-  OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);
-  OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);
-  if (NumAGPR) {
-    OutStreamer->emitRawComment(" NumAgprs: " + Twine(*NumAGPR), false);
-    OutStreamer->emitRawComment(" TotalNumVgprs: " + Twine(TotalNumVGPR),
-                                false);
+  for (Function &F : M.functions()) {
+    ValidateMCResourceInfo(F);
   }
-  OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);
-  OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),
-                              false);
+  return AsmPrinter::doFinalization(M);
 }
 
 SmallString<128> AMDGPUAsmPrinter::getMCExprStr(const MCExpr *Value) {
   SmallString<128> Str;
   raw_svector_ostream OSS(Str);
-  int64_t IVal;
-  if (Value->evaluateAsAbsolute(IVal)) {
-    OSS << static_cast<uint64_t>(IVal);
-  } else {
-    Value->print(OSS, MAI);
-  }
+  auto &Streamer = getTargetStreamer()->getStreamer();
+  auto &Context = Streamer.getContext();
+  const MCExpr *New = llvm::TryFold(Value, Context);
+  AMDGPUMCExprPrint(New, OSS, MAI);
   return Str;
 }
 
+// Print comments that apply to both callable functions and entry points.
 void AMDGPUAsmPrinter::emitCommonFunctionComments(
     const MCExpr *NumVGPR, const MCExpr *NumAGPR, const MCExpr *TotalNumVGPR,
     const MCExpr *NumSGPR, const MCExpr *ScratchSize, uint64_t CodeSize,
@@ -573,21 +663,45 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
   emitResourceUsageRemarks(MF, CurrentProgramInfo, MFI->isModuleEntryFunction(),
                            STM.hasMAIInsts());
 
+  {
+    const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+        ResourceUsage->getResourceInfo();
+    RI->gatherResourceInfo(MF, Info);
+    using RIK = MCResourceInfo::ResourceInfoKind;
+    getTargetStreamer()->EmitMCResourceInfo(
+        RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumSGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesVCC),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesFlatScratch),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasDynSizedStack),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasRecursion),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasIndirectCall));
+  }
+
   if (isVerbose()) {
     MCSectionELF *CommentSection =
         Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);
     OutStreamer->switchSection(CommentSection);
 
     if (!MFI->isEntryFunction()) {
+      using RIK = MCResourceInfo::ResourceInfoKind;
       OutStreamer->emitRawComment(" Function info:", false);
-      const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-          ResourceUsage->getResourceInfo(&MF.getFunction());
+
       emitCommonFunctionComments(
-          Info.NumVGPR,
-          STM.hasMAIInsts() ? Info.NumAGPR : std::optional<uint32_t>(),
-          Info.getTotalNumVGPRs(STM),
-          Info.getTotalNumSGPRs(MF.getSubtarget<GCNSubtarget>()),
-          Info.PrivateSegmentSize, getFunctionCodeSize(MF), MFI);
+          RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR)->getVariableValue(),
+          STM.hasMAIInsts() ? RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR)
+                                  ->getVariableValue()
+                            : nullptr,
+          RI->createTotalNumVGPRs(MF, Ctx),
+          RI->createTotalNumSGPRs(
+              MF,
+              MF.getSubtarget<GCNSubtarget>().getTargetID().isXnackOnOrAny(),
+              Ctx),
+          RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize)
+              ->getVariableValue(),
+          getFunctionCodeSize(MF), MFI);
       return false;
     }
 
@@ -755,8 +869,6 @@ uint64_t AMDGPUAsmPrinter::getFunctionCodeSize(const MachineFunction &MF) const
 
 void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
                                         const MachineFunction &MF) {
-  const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-      ResourceUsage->getResourceInfo(&MF.getFunction());
   const GCNSubtarget &STM = MF.getSubtarget<GCNSubtarget>();
   MCContext &Ctx = MF.getContext();
 
@@ -773,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
     return false;
   };
 
-  ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR);
-  ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR);
-  ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM));
-  ProgInfo.AccumOffset =
-      CreateExpr(alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1);
+  auto GetSymRefExpr =
+      [&](MCResourceInfo::ResourceInfoKind RIK) -> const MCExpr * {
+    MCSymbol *Sym = RI->getSymbol(MF.getName(), RIK);
+    return MCSymbolRefExpr::create(Sym, Ctx);
+  };
+
+  const MCExpr *ConstFour = MCConstantExpr::create(4, Ctx);
+  const MCExpr *ConstOne = MCConstantExpr::create(1, Ctx);
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  ProgInfo.NumArchVGPR = GetSymRefExpr(RIK::RIK_NumVGPR);
+  ProgInfo.NumAccVGPR = GetSymRefExpr(RIK::RIK_NumAGPR);
+  ProgInfo.NumVGPR = AMDGPUMCExpr::createTotalNumVGPR(
+      ProgInfo.NumAccVGPR, ProgInfo.NumArchVGPR, Ctx);
+
+  // AccumOffset computed for the MCExpr equivalent of:
+  // alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1;
+  ProgInfo.AccumOffset = MCBinaryExpr::createSub(
+      MCBinaryExpr::createDiv(
+          AMDGPUMCExpr::createAlignTo(
+              AMDGPUMCExpr::createMax({ConstOne, ProgInfo.NumArchVGPR}, Ctx),
+              ConstFour, Ctx),
+          ConstFour, Ctx),
+      ConstOne, Ctx);
   ProgInfo.TgSplit = STM.isTgSplitEnabled();
-  ProgInfo.NumSGPR = CreateExpr(Info.NumExplicitSGPR);
-  ProgInfo.ScratchSize = CreateExpr(Info.PrivateSegmentSize);
-  ProgInfo.VCCUsed = CreateExpr(Info.UsesVCC);
-  ProgInfo.FlatUsed = CreateExpr(Info.UsesFlatScratch);
+  ProgInfo.NumSGPR = GetSymRefExpr(RIK::RIK_NumSGPR);
+  ProgInfo.ScratchSize = GetSymRefExpr(RIK::RIK_PrivateSegSize);
+  ProgInfo.VCCUsed = GetSymRefExpr(RIK::RIK_UsesVCC);
+  ProgInfo.FlatUsed = GetSymRefExpr(RIK::RIK_UsesFlatScratch);
   ProgInfo.DynamicCallStack =
-      CreateExpr(Info.HasDynamicallySizedStack || Info.HasRecursion);
+      MCBinaryExpr::createOr(GetSymRefExpr(RIK::RIK_HasDynSizedStack),
+                             GetSymRefExpr(RIK::RIK_HasRecursion), Ctx);
 
   const uint64_t MaxScratchPerWorkitem =
       STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
@@ -1084,6 +1216,8 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
       STM.computeOccupancy(F, ProgInfo.LDSSize), ProgInfo.NumSGPRsForWavesPerEU,
       ProgInfo.NumVGPRsForWavesPerEU, STM, Ctx);
 
+  OccupancyValidateMap->insert({&MF.getFunction(), ProgInfo.Occupancy});
+
   const auto [MinWEU, MaxWEU] =
       AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", {0, 0}, true);
   uint64_t Occupancy;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index f66bbde42ce278..676a4687ee2af7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -24,6 +24,7 @@ struct AMDGPUResourceUsageAnalysis;
 class AMDGPUTargetStreamer;
 class MCCodeEmitter;
 class MCOperand;
+class MCResourceInfo;
 
 namespace AMDGPU {
 struct MCKernelDescriptor;
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   AMDGPUResourceUsageAnalysis *ResourceUsage;
 
+  std::unique_ptr<MCResourceInfo> RI;
+
   SIProgramInfo CurrentProgramInfo;
 
   std::unique_ptr<AMDGPU::HSAMD::MetadataStreamer> HSAMetadataStream;
 
   MCCodeEmitter *DumpCodeInstEmitter = nullptr;
 
+  // ValidateMCResourceInfo cannot recompute parts of the occupancy as it does
+  // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
+  // really want to track and validate the occupancy.
+  std::unique_ptr<DenseMap<const Function *, const MCExpr *>>
+      OccupancyValidateMap;
+
   uint64_t getFunctionCodeSize(const MachineFunction &MF) const;
 
   void getSIProgramInfo(SIProgramInfo &Out, const MachineFunction &MF);
@@ -60,11 +69,6 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
   void EmitPALMetadata(const MachineFunction &MF,
                        const SIProgramInfo &KernelInfo);
   void emitPALFunctionMetadata(const MachineFunction &MF);
-  void emitCommonFunctionComments(uint32_t NumVGPR,
-                                  std::optional<uint32_t> NumAGPR,
-                                  uint32_t TotalNumVGPR, uint32_t NumSGPR,
-                                  uint64_t ScratchSize, uint64_t CodeSize,
-                                  const AMDGPUMachineFunction *MFI);
   void emitCommonFunctionComments(const MCExpr *NumVGPR, const MCExpr *NumAGPR,
                                   const MCExpr *TotalNumVGPR,
                                   const MCExpr *NumSGPR,
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   SmallString<128> getMCExprStr(const MCExpr *Value);
 
+  /// Attempts to replace the validation that is missed in getSIProgramInfo due
+  /// to MCExpr being unknown. Invoked during doFinalization such that the
+  /// MCResourceInfo symbols are known.
+  void ValidateMCResourceInfo(Function &F);
+
 public:
   explicit AMDGPUAsmPrinter(TargetMachine &TM,
                             std::unique_ptr<MCStreamer> Streamer);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
new file mode 100644
index 00000000000000..58383475b312c9
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info ----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
+  switch (RIK) {
+  case RIK_NumVGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));
+  case RIK_NumAGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_agpr"));
+  case RIK_NumSGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_sgpr"));
+  case RIK_PrivateSegSize:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".private_seg_size"));
+  case RIK_UsesVCC:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_vcc"));
+  case RIK_UsesFlatScratch:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_flat_scratch"));
+  case RIK_HasDynSizedStack:
+    return OutContext.getOrCreateSymbol(FuncName +
+                                        Twine(".has_dyn_sized_stack"));
+  case RIK_HasRecursion:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_recursion"));
+  case RIK_HasIndirectCall:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_indirect_call"));
+  }
+  llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+                                            ResourceInfoKind RIK,
+                                            MCContext &Ctx) {
+  return MCSymbolRefExpr::create(getSymbol(FuncName, RIK), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs() {
+  // Assign expression to get the max register use to the max_num_Xgpr symbol.
+  MCSymbol *MaxVGPRSym = getMaxVGPRSymbol();
+  MCSymbol *MaxAGPRSym = getMaxAGPRSymbol();
+  MCSymbol *MaxSGPRSym = getMaxSGPRSymbol();
+
+  auto assignMaxRegSym = [this](MCSymbol *Sym, int32_t RegCount) {
+    const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+    Sym->setVariableValue(MaxExpr);
+  };
+
+  assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+  assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+  assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::Finalize() {
+  assert(!finalized && "Cannot finalize ResourceInfo again.");
+  finalized = true;
+  assignMaxRegs();
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_sgpr");
+}
+
+void MCResourceInfo::assignResourceInfoExpr(
+   ...
[truncated]

llvmbot · 2024-08-12T14:45:12Z

@llvm/pr-subscribers-llvm-globalisel

Author: Janek van Oirschot (JanekvO)

Changes

!!! Stacked PR on top of #95951 commit, please only review the latest commit 51f72f115b340a092c2c9f8569911b944a4efb6d !!!!

Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.

Function resource info (finalized) validation added through ValidateMCResourceInfo. Previously (and currently, still) done in getSIProgramInfo. However, MCExprs may not be resolvable during getSIProgramInfo due to propagation of yet-to-be-defined function resource info symbols. Does require some caching of the generated occupancy MCExpr computation in getSIProgramInfo to validate against.
Unfortunately, requires module specific max_num_Xgprs symbols to be emitted. Cannot generate a separate MCExpr for finding the max without getting recursive symbol defines.
Most test changes are caused by the representation changes from constant/known values to MCExprs.

Patch is 369.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102913.diff

66 Files Affected:

(modified) clang/test/Frontend/amdgcn-machine-analysis-remarks.cl (+5-5)
(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+176-42)
(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h (+14-5)
(added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp (+220)
(added) llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h (+94)
(modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp (+22-125)
(modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.h (+12-28)
(modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.cpp (+359)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCExpr.h (+11)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+55-14)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h (+23)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUPALMetadata.cpp (+5-5)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp (+20-26)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.h (+4-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll (+84-84)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll (+9-7)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.workitem.id.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/agpr-register-count.ll (+37-28)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/amdpal-metadata-agpr-register-count.ll (+3-1)
(modified) llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll (+28-23)
(modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+26-4)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+10-6)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+6-2)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+9-2)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+9-2)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+9-2)
(modified) llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll (+39-11)
(modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/cndmask-no-def-vcc.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/code-object-v3.ll (+24-10)
(modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll (+4-2)
(modified) llvm/test/CodeGen/AMDGPU/elf.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/enable-scratch-only-dynamic-stack.ll (+9-5)
(added) llvm/test/CodeGen/AMDGPU/function-resource-usage.ll (+533)
(modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+8-8)
(modified) llvm/test/CodeGen/AMDGPU/inline-asm-reserved-regs.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+12-12)
(modified) llvm/test/CodeGen/AMDGPU/ipra.ll (+7-7)
(modified) llvm/test/CodeGen/AMDGPU/kernarg-size.ll (+3-1)
(modified) llvm/test/CodeGen/AMDGPU/kernel_code_t_recurse.ll (+6-1)
(modified) llvm/test/CodeGen/AMDGPU/large-alloca-compute.ll (+23-8)
(modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+15-30)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll (+13-13)
(modified) llvm/test/CodeGen/AMDGPU/mesa3d.ll (+8-5)
(modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+49-50)
(modified) llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll (+12-8)
(modified) llvm/test/CodeGen/AMDGPU/recursion.ll (+18-10)
(modified) llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll (+32-32)
(modified) llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll (+7-6)
(modified) llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll (+90-36)
(modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/trap.ll (+21-12)
(modified) llvm/test/MC/AMDGPU/amd_kernel_code_t.s (+14-22)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s (+32-32)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s (+30-30)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s (+29-29)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s (+27-27)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s (+27-27)
(modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s (+23-23)

diff --git a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
index a05e21b37b9127..a2dd59a871904c 100644
--- a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
+++ b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
@@ -2,12 +2,12 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx908 -Rpass-analysis=kernel-resource-usage -S -O0 -verify %s -o /dev/null
 
 // expected-remark@+10 {{Function Name: foo}}
-// expected-remark@+9 {{    SGPRs: 13}}
-// expected-remark@+8 {{    VGPRs: 10}}
-// expected-remark@+7 {{    AGPRs: 12}}
-// expected-remark@+6 {{    ScratchSize [bytes/lane]: 0}}
+// expected-remark@+9 {{    SGPRs: foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1))}}
+// expected-remark@+8 {{    VGPRs: foo.num_vgpr}}
+// expected-remark@+7 {{    AGPRs: foo.num_agpr}}
+// expected-remark@+6 {{    ScratchSize [bytes/lane]: foo.private_seg_size}}
 // expected-remark@+5 {{    Dynamic Stack: False}}
-// expected-remark@+4 {{    Occupancy [waves/SIMD]: 10}}
+// expected-remark@+4 {{    Occupancy [waves/SIMD]: occupancy(10, 4, 256, 8, 10, max(foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1)), 1, 0), max(totalnumvgprs(foo.num_agpr, foo.num_vgpr), 1, 0))}}
 // expected-remark@+3 {{    SGPRs Spill: 0}}
 // expected-remark@+2 {{    VGPRs Spill: 0}}
 // expected-remark@+1 {{    LDS Size [bytes/block]: 0}}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index e64e28e01d3d18..97a5cb29d51023 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -18,6 +18,7 @@
 #include "AMDGPUAsmPrinter.h"
 #include "AMDGPU.h"
 #include "AMDGPUHSAMetadataStreamer.h"
+#include "AMDGPUMCResourceInfo.h"
 #include "AMDGPUResourceUsageAnalysis.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUInstPrinter.h"
@@ -92,6 +93,9 @@ AMDGPUAsmPrinter::AMDGPUAsmPrinter(TargetMachine &TM,
                                    std::unique_ptr<MCStreamer> Streamer)
     : AsmPrinter(TM, std::move(Streamer)) {
   assert(OutStreamer && "AsmPrinter constructed without streamer");
+  RI = std::make_unique<MCResourceInfo>(OutContext);
+  OccupancyValidateMap =
+      std::make_unique<DenseMap<const Function *, const MCExpr *>>();
 }
 
 StringRef AMDGPUAsmPrinter::getPassName() const {
@@ -359,6 +363,102 @@ bool AMDGPUAsmPrinter::doInitialization(Module &M) {
   return AsmPrinter::doInitialization(M);
 }
 
+void AMDGPUAsmPrinter::ValidateMCResourceInfo(Function &F) {
+  if (F.isDeclaration() || !AMDGPU::isModuleEntryFunctionCC(F.getCallingConv()))
+    return;
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  const GCNSubtarget &STM = TM.getSubtarget<GCNSubtarget>(F);
+
+  auto TryGetMCExprValue = [](const MCExpr *Value, uint64_t &Res) -> bool {
+    int64_t Val;
+    if (Value->evaluateAsAbsolute(Val)) {
+      Res = Val;
+      return true;
+    }
+    return false;
+  };
+
+  const uint64_t MaxScratchPerWorkitem =
+      STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
+  MCSymbol *ScratchSizeSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_PrivateSegSize);
+  uint64_t ScratchSize;
+  if (ScratchSizeSymbol->isVariable() &&
+      TryGetMCExprValue(ScratchSizeSymbol->getVariableValue(), ScratchSize) &&
+      ScratchSize > MaxScratchPerWorkitem) {
+    DiagnosticInfoStackSize DiagStackSize(F, ScratchSize, MaxScratchPerWorkitem,
+                                          DS_Error);
+    F.getContext().diagnose(DiagStackSize);
+  }
+
+  // Validate addressable scalar registers (i.e., prior to added implicit
+  // SGPRs).
+  MCSymbol *NumSGPRSymbol = RI->getSymbol(F.getName(), RIK::RIK_NumSGPR);
+  if (STM.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS &&
+      !STM.hasSGPRInitBug()) {
+    unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+    uint64_t NumSgpr;
+    if (NumSGPRSymbol->isVariable() &&
+        TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+        NumSgpr > MaxAddressableNumSGPRs) {
+      DiagnosticInfoResourceLimit Diag(F, "addressable scalar registers",
+                                       NumSgpr, MaxAddressableNumSGPRs,
+                                       DS_Error, DK_ResourceLimit);
+      F.getContext().diagnose(Diag);
+      return;
+    }
+  }
+
+  MCSymbol *VCCUsedSymbol = RI->getSymbol(F.getName(), RIK::RIK_UsesVCC);
+  MCSymbol *FlatUsedSymbol =
+      RI->getSymbol(F.getName(), RIK::RIK_UsesFlatScratch);
+  uint64_t VCCUsed, FlatUsed, NumSgpr;
+
+  if (NumSGPRSymbol->isVariable() && VCCUsedSymbol->isVariable() &&
+      FlatUsedSymbol->isVariable() &&
+      TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+      TryGetMCExprValue(VCCUsedSymbol->getVariableValue(), VCCUsed) &&
+      TryGetMCExprValue(FlatUsedSymbol->getVariableValue(), FlatUsed)) {
+
+    // Recomputes NumSgprs + implicit SGPRs but all symbols should now be
+    // resolvable.
+    NumSgpr += IsaInfo::getNumExtraSGPRs(
+        &STM, VCCUsed, FlatUsed,
+        getTargetStreamer()->getTargetID()->isXnackOnOrAny());
+    if (STM.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ||
+        STM.hasSGPRInitBug()) {
+      unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+      if (NumSgpr > MaxAddressableNumSGPRs) {
+        DiagnosticInfoResourceLimit Diag(F, "scalar registers", NumSgpr,
+                                         MaxAddressableNumSGPRs, DS_Error,
+                                         DK_ResourceLimit);
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+
+    auto I = OccupancyValidateMap->find(&F);
+    if (I != OccupancyValidateMap->end()) {
+      const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+          F, "amdgpu-waves-per-eu", {0, 0}, true);
+      uint64_t Occupancy;
+      const MCExpr *OccupancyExpr = I->getSecond();
+
+      if (TryGetMCExprValue(OccupancyExpr, Occupancy) && Occupancy < MinWEU) {
+        DiagnosticInfoOptimizationFailure Diag(
+            F, F.getSubprogram(),
+            "failed to meet occupancy target given by 'amdgpu-waves-per-eu' in "
+            "'" +
+                F.getName() + "': desired occupancy was " + Twine(MinWEU) +
+                ", final occupancy is " + Twine(Occupancy));
+        F.getContext().diagnose(Diag);
+        return;
+      }
+    }
+  }
+}
+
 bool AMDGPUAsmPrinter::doFinalization(Module &M) {
   // Pad with s_code_end to help tools and guard against instruction prefetch
   // causing stale data in caches. Arguably this should be done by the linker,
@@ -371,39 +471,29 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
     getTargetStreamer()->EmitCodeEnd(STI);
   }
 
-  return AsmPrinter::doFinalization(M);
-}
+  // Assign expressions which can only be resolved when all other functions are
+  // known.
+  RI->Finalize();
+  getTargetStreamer()->EmitMCResourceMaximums(
+      RI->getMaxVGPRSymbol(), RI->getMaxAGPRSymbol(), RI->getMaxSGPRSymbol());
 
-// Print comments that apply to both callable functions and entry points.
-void AMDGPUAsmPrinter::emitCommonFunctionComments(
-    uint32_t NumVGPR, std::optional<uint32_t> NumAGPR, uint32_t TotalNumVGPR,
-    uint32_t NumSGPR, uint64_t ScratchSize, uint64_t CodeSize,
-    const AMDGPUMachineFunction *MFI) {
-  OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);
-  OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);
-  OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);
-  if (NumAGPR) {
-    OutStreamer->emitRawComment(" NumAgprs: " + Twine(*NumAGPR), false);
-    OutStreamer->emitRawComment(" TotalNumVgprs: " + Twine(TotalNumVGPR),
-                                false);
+  for (Function &F : M.functions()) {
+    ValidateMCResourceInfo(F);
   }
-  OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);
-  OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),
-                              false);
+  return AsmPrinter::doFinalization(M);
 }
 
 SmallString<128> AMDGPUAsmPrinter::getMCExprStr(const MCExpr *Value) {
   SmallString<128> Str;
   raw_svector_ostream OSS(Str);
-  int64_t IVal;
-  if (Value->evaluateAsAbsolute(IVal)) {
-    OSS << static_cast<uint64_t>(IVal);
-  } else {
-    Value->print(OSS, MAI);
-  }
+  auto &Streamer = getTargetStreamer()->getStreamer();
+  auto &Context = Streamer.getContext();
+  const MCExpr *New = llvm::TryFold(Value, Context);
+  AMDGPUMCExprPrint(New, OSS, MAI);
   return Str;
 }
 
+// Print comments that apply to both callable functions and entry points.
 void AMDGPUAsmPrinter::emitCommonFunctionComments(
     const MCExpr *NumVGPR, const MCExpr *NumAGPR, const MCExpr *TotalNumVGPR,
     const MCExpr *NumSGPR, const MCExpr *ScratchSize, uint64_t CodeSize,
@@ -573,21 +663,45 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
   emitResourceUsageRemarks(MF, CurrentProgramInfo, MFI->isModuleEntryFunction(),
                            STM.hasMAIInsts());
 
+  {
+    const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+        ResourceUsage->getResourceInfo();
+    RI->gatherResourceInfo(MF, Info);
+    using RIK = MCResourceInfo::ResourceInfoKind;
+    getTargetStreamer()->EmitMCResourceInfo(
+        RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_NumSGPR),
+        RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesVCC),
+        RI->getSymbol(MF.getName(), RIK::RIK_UsesFlatScratch),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasDynSizedStack),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasRecursion),
+        RI->getSymbol(MF.getName(), RIK::RIK_HasIndirectCall));
+  }
+
   if (isVerbose()) {
     MCSectionELF *CommentSection =
         Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);
     OutStreamer->switchSection(CommentSection);
 
     if (!MFI->isEntryFunction()) {
+      using RIK = MCResourceInfo::ResourceInfoKind;
       OutStreamer->emitRawComment(" Function info:", false);
-      const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-          ResourceUsage->getResourceInfo(&MF.getFunction());
+
       emitCommonFunctionComments(
-          Info.NumVGPR,
-          STM.hasMAIInsts() ? Info.NumAGPR : std::optional<uint32_t>(),
-          Info.getTotalNumVGPRs(STM),
-          Info.getTotalNumSGPRs(MF.getSubtarget<GCNSubtarget>()),
-          Info.PrivateSegmentSize, getFunctionCodeSize(MF), MFI);
+          RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR)->getVariableValue(),
+          STM.hasMAIInsts() ? RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR)
+                                  ->getVariableValue()
+                            : nullptr,
+          RI->createTotalNumVGPRs(MF, Ctx),
+          RI->createTotalNumSGPRs(
+              MF,
+              MF.getSubtarget<GCNSubtarget>().getTargetID().isXnackOnOrAny(),
+              Ctx),
+          RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize)
+              ->getVariableValue(),
+          getFunctionCodeSize(MF), MFI);
       return false;
     }
 
@@ -755,8 +869,6 @@ uint64_t AMDGPUAsmPrinter::getFunctionCodeSize(const MachineFunction &MF) const
 
 void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
                                         const MachineFunction &MF) {
-  const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
-      ResourceUsage->getResourceInfo(&MF.getFunction());
   const GCNSubtarget &STM = MF.getSubtarget<GCNSubtarget>();
   MCContext &Ctx = MF.getContext();
 
@@ -773,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
     return false;
   };
 
-  ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR);
-  ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR);
-  ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM));
-  ProgInfo.AccumOffset =
-      CreateExpr(alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1);
+  auto GetSymRefExpr =
+      [&](MCResourceInfo::ResourceInfoKind RIK) -> const MCExpr * {
+    MCSymbol *Sym = RI->getSymbol(MF.getName(), RIK);
+    return MCSymbolRefExpr::create(Sym, Ctx);
+  };
+
+  const MCExpr *ConstFour = MCConstantExpr::create(4, Ctx);
+  const MCExpr *ConstOne = MCConstantExpr::create(1, Ctx);
+
+  using RIK = MCResourceInfo::ResourceInfoKind;
+  ProgInfo.NumArchVGPR = GetSymRefExpr(RIK::RIK_NumVGPR);
+  ProgInfo.NumAccVGPR = GetSymRefExpr(RIK::RIK_NumAGPR);
+  ProgInfo.NumVGPR = AMDGPUMCExpr::createTotalNumVGPR(
+      ProgInfo.NumAccVGPR, ProgInfo.NumArchVGPR, Ctx);
+
+  // AccumOffset computed for the MCExpr equivalent of:
+  // alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1;
+  ProgInfo.AccumOffset = MCBinaryExpr::createSub(
+      MCBinaryExpr::createDiv(
+          AMDGPUMCExpr::createAlignTo(
+              AMDGPUMCExpr::createMax({ConstOne, ProgInfo.NumArchVGPR}, Ctx),
+              ConstFour, Ctx),
+          ConstFour, Ctx),
+      ConstOne, Ctx);
   ProgInfo.TgSplit = STM.isTgSplitEnabled();
-  ProgInfo.NumSGPR = CreateExpr(Info.NumExplicitSGPR);
-  ProgInfo.ScratchSize = CreateExpr(Info.PrivateSegmentSize);
-  ProgInfo.VCCUsed = CreateExpr(Info.UsesVCC);
-  ProgInfo.FlatUsed = CreateExpr(Info.UsesFlatScratch);
+  ProgInfo.NumSGPR = GetSymRefExpr(RIK::RIK_NumSGPR);
+  ProgInfo.ScratchSize = GetSymRefExpr(RIK::RIK_PrivateSegSize);
+  ProgInfo.VCCUsed = GetSymRefExpr(RIK::RIK_UsesVCC);
+  ProgInfo.FlatUsed = GetSymRefExpr(RIK::RIK_UsesFlatScratch);
   ProgInfo.DynamicCallStack =
-      CreateExpr(Info.HasDynamicallySizedStack || Info.HasRecursion);
+      MCBinaryExpr::createOr(GetSymRefExpr(RIK::RIK_HasDynSizedStack),
+                             GetSymRefExpr(RIK::RIK_HasRecursion), Ctx);
 
   const uint64_t MaxScratchPerWorkitem =
       STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
@@ -1084,6 +1216,8 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
       STM.computeOccupancy(F, ProgInfo.LDSSize), ProgInfo.NumSGPRsForWavesPerEU,
       ProgInfo.NumVGPRsForWavesPerEU, STM, Ctx);
 
+  OccupancyValidateMap->insert({&MF.getFunction(), ProgInfo.Occupancy});
+
   const auto [MinWEU, MaxWEU] =
       AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", {0, 0}, true);
   uint64_t Occupancy;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index f66bbde42ce278..676a4687ee2af7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -24,6 +24,7 @@ struct AMDGPUResourceUsageAnalysis;
 class AMDGPUTargetStreamer;
 class MCCodeEmitter;
 class MCOperand;
+class MCResourceInfo;
 
 namespace AMDGPU {
 struct MCKernelDescriptor;
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   AMDGPUResourceUsageAnalysis *ResourceUsage;
 
+  std::unique_ptr<MCResourceInfo> RI;
+
   SIProgramInfo CurrentProgramInfo;
 
   std::unique_ptr<AMDGPU::HSAMD::MetadataStreamer> HSAMetadataStream;
 
   MCCodeEmitter *DumpCodeInstEmitter = nullptr;
 
+  // ValidateMCResourceInfo cannot recompute parts of the occupancy as it does
+  // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
+  // really want to track and validate the occupancy.
+  std::unique_ptr<DenseMap<const Function *, const MCExpr *>>
+      OccupancyValidateMap;
+
   uint64_t getFunctionCodeSize(const MachineFunction &MF) const;
 
   void getSIProgramInfo(SIProgramInfo &Out, const MachineFunction &MF);
@@ -60,11 +69,6 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
   void EmitPALMetadata(const MachineFunction &MF,
                        const SIProgramInfo &KernelInfo);
   void emitPALFunctionMetadata(const MachineFunction &MF);
-  void emitCommonFunctionComments(uint32_t NumVGPR,
-                                  std::optional<uint32_t> NumAGPR,
-                                  uint32_t TotalNumVGPR, uint32_t NumSGPR,
-                                  uint64_t ScratchSize, uint64_t CodeSize,
-                                  const AMDGPUMachineFunction *MFI);
   void emitCommonFunctionComments(const MCExpr *NumVGPR, const MCExpr *NumAGPR,
                                   const MCExpr *TotalNumVGPR,
                                   const MCExpr *NumSGPR,
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   SmallString<128> getMCExprStr(const MCExpr *Value);
 
+  /// Attempts to replace the validation that is missed in getSIProgramInfo due
+  /// to MCExpr being unknown. Invoked during doFinalization such that the
+  /// MCResourceInfo symbols are known.
+  void ValidateMCResourceInfo(Function &F);
+
 public:
   explicit AMDGPUAsmPrinter(TargetMachine &TM,
                             std::unique_ptr<MCStreamer> Streamer);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
new file mode 100644
index 00000000000000..58383475b312c9
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info ----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
+  switch (RIK) {
+  case RIK_NumVGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));
+  case RIK_NumAGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_agpr"));
+  case RIK_NumSGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_sgpr"));
+  case RIK_PrivateSegSize:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".private_seg_size"));
+  case RIK_UsesVCC:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_vcc"));
+  case RIK_UsesFlatScratch:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_flat_scratch"));
+  case RIK_HasDynSizedStack:
+    return OutContext.getOrCreateSymbol(FuncName +
+                                        Twine(".has_dyn_sized_stack"));
+  case RIK_HasRecursion:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_recursion"));
+  case RIK_HasIndirectCall:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".has_indirect_call"));
+  }
+  llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+                                            ResourceInfoKind RIK,
+                                            MCContext &Ctx) {
+  return MCSymbolRefExpr::create(getSymbol(FuncName, RIK), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs() {
+  // Assign expression to get the max register use to the max_num_Xgpr symbol.
+  MCSymbol *MaxVGPRSym = getMaxVGPRSymbol();
+  MCSymbol *MaxAGPRSym = getMaxAGPRSymbol();
+  MCSymbol *MaxSGPRSym = getMaxSGPRSymbol();
+
+  auto assignMaxRegSym = [this](MCSymbol *Sym, int32_t RegCount) {
+    const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+    Sym->setVariableValue(MaxExpr);
+  };
+
+  assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+  assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+  assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::Finalize() {
+  assert(!finalized && "Cannot finalize ResourceInfo again.");
+  finalized = true;
+  assignMaxRegs();
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol() {
+  return OutContext.getOrCreateSymbol("max_num_sgpr");
+}
+
+void MCResourceInfo::assignResourceInfoExpr(
+   ...
[truncated]

github-actions · 2024-08-12T14:48:43Z

✅ With the latest revision this PR passed the C/C++ code formatter.

JanekvO · 2024-08-12T14:58:19Z

Upstream's formatting does not seem to agree with my local git clang-format.

slinder1 · 2024-08-14T20:59:35Z

Is it best to leave comments on the commit from the PR comment, i.e. currently 51f72f1 ?

I can't seem to start a review, so it will be individual comments (I think). Is there another way to handle this?

JanekvO · 2024-08-15T13:24:39Z

Is it best to leave comments on the commit from the PR comment, i.e. currently 51f72f1 ?

I can't seem to start a review, so it will be individual comments (I think). Is there another way to handle this?

Apologies, I think some of the additional stacked PR tooling might've been better to use rather than this manual stacked PR but I just haven't looked into any of said tooling yet (Graphite?). The PR that this one depended on has been pushed and I've rebased so it should be good to go. Thanks!

arsenm · 2024-08-15T14:06:24Z

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h

@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {

  AMDGPUResourceUsageAnalysis *ResourceUsage;

+  std::unique_ptr<MCResourceInfo> RI;


Why does this need unique_ptr instead of just a plain member?

It requires a class inherited member for its constructor (in this case, OutContext).

But OutContext is a reference in the parent already, so you can use it?

Only was I was able to do so was by explicitly adding the MCContext as argument for every public method in AMDGPUResourceInfo. Let me know if there's a way I missed without being so verbose with MCContext arguments in AMDGPUResourceInfo.

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp

arsenm · 2024-08-15T14:14:37Z

llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll

@@ -3025,8 +3025,8 @@ define amdgpu_kernel void @dyn_extract_v5f64_s_s(ptr addrspace(1) %out, i32 %sel
 ; GPRIDX-NEXT:     amd_machine_version_stepping = 0
 ; GPRIDX-NEXT:     kernel_code_entry_byte_offset = 256
 ; GPRIDX-NEXT:     kernel_code_prefetch_byte_size = 0
-; GPRIDX-NEXT:     granulated_workitem_vgpr_count = 0
-; GPRIDX-NEXT:     granulated_wavefront_sgpr_count = 1
+; GPRIDX-NEXT:     granulated_workitem_vgpr_count = (11468800|(((((alignto(max(max(totalnumvgprs(dyn_extract_v5f64_s_s.num_agpr, dyn_extract_v5f64_s_s.num_vgpr), 1, 0), 1), 4))/4)-1)&63)|(((((alignto(max(max(dyn_extract_v5f64_s_s.num_sgpr+(extrasgprs(dyn_extract_v5f64_s_s.uses_vcc, dyn_extract_v5f64_s_s.uses_flat_scratch, 1)), 1, 0), 1), 8))/8)-1)&15)<<6)))&63


This turned into a mess for what should be a simple function

If any of the the sub-expressions of a MCExpr is unknown/unresolvable at this point in time (i.e., asm printing for a particular MachineFunction) it will print out the MCExpr in its most verbose way possible. It doesn't help that both the granulated_workitem_vgpr_count and granulated_wavefront_sgpr_count are basically the same MCExpr, but masked for the only the relevant bits (i.e., compute_pgm_resource1_registers masked for whatever we want to retrieve).

I was thinking of explicitly splitting all of the components that compose any of the compute_pgm_resourceX registers into their own MCExpr and leave computation of the compute_pgm_resourceX register for when they're used/necessary. However, this wouldn't help resolving the unknowns/unresolvables at the time of printing the amd_kernel_code_t metadata. Do let me know if splitting the composed registers is still desired, though.

But why wasn't this resolvable? This function has no calls or anything complicated

Correct, I had to move the AMDGPUResourceInfo gathering function prior to amdhsa kernel descriptor asm printer to make deduce the values.

arsenm · 2024-08-15T14:16:24Z

llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll

 ; DEFAULTSIZE: ; ScratchSize: 16

-; ASSUME1024: .amdhsa_private_segment_fixed_size 1040
+; ASSUME1024: .amdhsa_private_segment_fixed_size kernel_non_entry_block_static_alloca_uniformly_reached_align4.private_seg_size


Why doesn't this print as a simple size anymore?

As a function's private_segment_size depends on itself, and all of its callees' propagated private_segment_size it may be the case that any of the callees' private_segment_size has yet to be analysed through AMDGPUResourceUsageAnalysis.
For example:

void foo() { ... call bar ... } void bar() { ... ... }

Where for foo, we know and can construct the calculation for its private_segment_size as

[foo's own private segment required size] + max(bar.private_segment_size, [any other of foo's called function's private_segment_size])

But because we are currently printing foo's MachineFunction in AMDGPUAsmPrinter, we haven't analysed bar's private_segment_size yet. This will eventually be known but as we need to print metadata for foo, it's already printed with bar's private_segment_size placeholder.

TL;DR: cannot compute constant value yet, need to print symbolic representation.

My point was more this specific test case does not have calls and should be trivially resolved

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp

jhuber6 · 2024-08-16T17:19:32Z

I applied this locally and it resolved #64863 so I'm looking forward to this landing.

slinder1

LGTM, with some nits

The expressions look pretty reasonable to me, although I didn't really go through validating that they are correct

slinder1 · 2024-08-16T18:43:04Z

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp

+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
+  switch (RIK) {
+  case RIK_NumVGPR:
+    return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));


Are these names guaranteed to be unique? Do we reserve any symbol with a '.' in it?

Are these names guaranteed to be unique?

From what I could tell, function names have to be unique at this point (e.g., name mangling already happened)

Do we reserve any symbol with a '.' in it?

I'll have a look at this, I didn't see any reserved symbols while I was working on this but I haven't explicitly searched for it either.

To follow up on the '.' symbols: there's a few cases of symbols including dots like local symbols getting prefixed with .L or some possibly emitted symbols like .note.GNU-stack but the dots don't seem to be exclusive. I can, however, use another delimiter (or remove delimiter altogether) if the dot might be something we'd like to keep for other cases.

The names seem fine to me, but we should write down which symbols are "reserved" or "well known" or however we want to define it relative to how they are intended to be used. Can you document them in AMDGPUUsage maybe?

My original question was more around how a user could technically contrive a conflict with an __asm__(("foo.num_vgpr")) definition, which is outlandish and easy enough for us to say "don't do that". I just want to make sure we write down the spirit of "don't do that", in a similar way to how C/C++ (I believe?) define symbols starting with "__" as reserved for implementation use.

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp

arsenm · 2024-08-19T15:14:46Z

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp

+    MCContext &OutContext) {
+  const MCConstantExpr *LocalConstExpr =
+      MCConstantExpr::create(LocalValue, OutContext);
+  const MCExpr *SymVal = LocalConstExpr;


I assume we don't take advantage of indirect calls with a known set of callees

As in, finding the module level maximum?

No, we can know when an indirect call has a known possible range of callees. Whether from !callees metadata or from seeing the possible values of the call target

Had to look into this a bit as I wasn't aware of the !callees metadata: but yes, we currently don't take advantage of this. I was able to create an example that emits the !callees metadata for an indirect call but it will always fall back on the worst case (i.e., module level worst case values).

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h

clang/test/Frontend/amdgcn-machine-analysis-remarks.cl

llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll

arsenm · 2024-08-20T17:06:35Z

llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll

@@ -3,6 +3,18 @@

 declare i32 @llvm.amdgcn.workitem.id.x()

+define <2 x i64> @f1() #0 {


Unrelated function appeared?

There were some function order changes in some tests. This function is the same one as the function that is removed a couple of lines below.

llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll

llvm/test/CodeGen/AMDGPU/function-resource-usage.ll

arsenm · 2024-08-21T18:46:53Z

llvm/test/CodeGen/AMDGPU/function-resource-usage.ll

@@ -0,0 +1,533 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
+
+; SGPR use may not seem equal to the sgpr use provided in comments as the latter includes extra sgprs (e.g., for vcc use).


We should either fix the naming of these fields to say it's only the numbered SGPRs, or just always include the + vcc part?

I applied the following renames:
the function level number of sgpr symbol went from <function name>.num_sgpr to <function name>.numbered_sgpr
remark went from SGPRs to TotalSGPRs
comment went from NumSgprs to TotalNumSgprs

arsenm · 2024-08-21T18:47:35Z

llvm/test/CodeGen/AMDGPU/function-resource-usage.ll

+}
+
+; GCN-LABEL: {{^}}indirect_use_vcc:
+; GCN: .set indirect_use_vcc.num_vgpr, max(41, use_vcc.num_vgpr)


If you reassemble the output, do the resources always resolve to constants?

If we were to go from function-resource-usage.ll to funcion-resource-usage.s and try to assemble function-resource-usage.s through llvm-mc, it would resolve whatever it can at the time of emitting again. In the specific case you've highlighted it would resolve into a constant as use_vcc.num_vgpr is defined before its use.

However, cases which use/depend on the module level maximum (e.g., max_num_vgpr) wouldn't be able to resolve to a constant as these module level maximums are defined at the bottom of the file.

TL;DR: order of symbol define/use dependent

JanekvO · 2024-08-29T12:49:04Z

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp

+MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_sgpr");
+}


I haven't convinced myself of the naming/approach I've used here. These are module scope maxima but every module is going to re-defined these for their own scope.

I wonder if these should be placed in a custom section. In any case, we will eventually need custom linker logic to deal with this

I've put these in their own section for now, albeit without any module-unique identifiers (.AMDGPU.gpr_maximums).

JanekvO · 2024-09-06T12:03:41Z

Ping

arsenm · 2024-09-15T10:35:05Z

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h

+  // validateMCResourceInfo cannot recompute parts of the occupancy as it does
+  // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
+  // really want to track and validate the occupancy.


I don't see why this is the case

Yeah, I thought I needed access to the MachineFunction for SIMachineFunctionInfo but I can reconstruct it using just the Function and GCNSubTarget. It now recomputes the occupancy instead of this caching.

You do need the MachineFunction to get its SIMachineFunctionInfo. Constructing a new one won't give the same result?

arsenm · 2024-09-15T10:45:46Z

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp

+MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_sgpr");
+}


I wonder if these should be placed in a custom section. In any case, we will eventually need custom linker logic to deal with this

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp

arsenm

I think we need more thought about how the ABI for this will work, but we need to start somewhere

JanekvO · 2024-09-25T16:56:55Z

Rebase

…on pass and move metadata propagation logic to MC layer

… amdhsa kernel descriptor emit, remove duplicate validation for stack size

…re verbose in what's emitted

…unction

JanekvO · 2024-09-26T15:57:04Z

Rebase, hopefully pull in fixes for unrelated test failures

llvm-ci · 2024-09-30T11:43:36Z

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-shared running on bolt-worker while building clang,llvm at step 6 "test-build-bolt-check-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/151/builds/2543

Here is the relevant piece of the build log for the reference

Step 6 (test-build-bolt-check-bolt) failure: test (failure)
******************** TEST 'BOLT :: perf2bolt/perf_test.test' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 5: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
RUN: at line 6: perf record -Fmax -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ perf record -Fmax -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
info: Using a maximum frequency rate of 2000 Hz
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.004 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 (44 samples) ]
RUN: at line 7: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id 2>&1 | /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
RUN: at line 12: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -no-pie -fuse-ld=lld -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -no-pie -fuse-ld=lld -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
RUN: at line 13: perf record -Fmax -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
+ perf record -Fmax -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
info: Using a maximum frequency rate of 2000 Hz
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 (9 samples) ]
RUN: at line 14: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4 -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp6 -nl -ignore-build-id 2>&1 | /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4 -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp6 -nl -ignore-build-id
/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test:10:12: error: CHECK-NOT: excluded string found in input
CHECK-NOT: !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection.
           ^
<stdin>:26:2: note: found here
 !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance.
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        .
        .
        .
       21: BOLT-WARNING: Running parallel work of 0 estimated cost, will switch to trivial scheduling. 
       22: PERF2BOLT: processing basic events (without LBR)... 
       23: PERF2BOLT: read 9 samples 
       24: PERF2BOLT: out of range samples recorded in unknown regions: 9 (100.0%) 
       25:  
       26:  !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance. 
not:10      !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                   error: no match expected
       27:  
...

llvm-ci · 2024-09-30T12:01:00Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-android running on sanitizer-buildbot-android while building clang,llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/2796

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[4577/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PthreadLockChecker.cpp.o
[4578/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PointerSortingChecker.cpp.o
[4579/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PointerSubChecker.cpp.o
[4580/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RetainCountChecker/RetainCountChecker.cpp.o
[4581/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnUndefChecker.cpp.o
[4582/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RetainCountChecker/RetainCountDiagnostics.cpp.o
[4583/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnPointerRangeChecker.cpp.o
[4584/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnValueChecker.cpp.o
[4585/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RunLoopAutoreleaseLeakChecker.cpp.o
[4586/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/include -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
[4587/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SmartPtrChecker.cpp.o
[4588/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SetgidSetuidOrderChecker.cpp.o
[4589/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SimpleStreamChecker.cpp.o
[4590/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SmartPtrModeling.cpp.o
[4591/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/StackAddrEscapeChecker.cpp.o
[4592/5265] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
[4593/5265] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUTargetStreamer.cpp.o
[4594/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[4595/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[4596/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[4597/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[4598/5265] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCTargetDesc.cpp.o
[4599/5265] Building CXX object lib/Target/AMDGPU/AsmParser/CMakeFiles/LLVMAMDGPUAsmParser.dir/AMDGPUAsmParser.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

@@@STEP_FAILURE@@@
Step 8 (bootstrap clang) failure: bootstrap clang (failure)
...
[4577/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PthreadLockChecker.cpp.o
[4578/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PointerSortingChecker.cpp.o
[4579/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/PointerSubChecker.cpp.o
[4580/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RetainCountChecker/RetainCountChecker.cpp.o
[4581/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnUndefChecker.cpp.o
[4582/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RetainCountChecker/RetainCountDiagnostics.cpp.o
[4583/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnPointerRangeChecker.cpp.o
[4584/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/ReturnValueChecker.cpp.o
[4585/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/RunLoopAutoreleaseLeakChecker.cpp.o
[4586/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/include -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
[4587/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SmartPtrChecker.cpp.o
[4588/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SetgidSetuidOrderChecker.cpp.o
[4589/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SimpleStreamChecker.cpp.o
[4590/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/SmartPtrModeling.cpp.o
[4591/5265] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/StackAddrEscapeChecker.cpp.o
[4592/5265] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
[4593/5265] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUTargetStreamer.cpp.o
[4594/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[4595/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[4596/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[4597/5265] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[4598/5265] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCTargetDesc.cpp.o
[4599/5265] Building CXX object lib/Target/AMDGPU/AsmParser/CMakeFiles/LLVMAMDGPUAsmParser.dir/AMDGPUAsmParser.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
program finished with exit code 2
elapsedTime=287.340031

llvm-ci · 2024-09-30T12:01:23Z

LLVM Buildbot has detected a new failure on builder ppc64le-lld-multistage-test running on ppc64le-lld-multistage-test while building clang,llvm at step 12 "build-stage2-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/3916

Here is the relevant piece of the build log for the reference

Step 12 (build-stage2-unified-tree) failure: build (failure)
...
588.747 [150/502/5621] Building CXX object unittests/DebugInfo/DWARF/CMakeFiles/DebugInfoDWARFTests.dir/DWARFLocationExpressionTest.cpp.o
588.749 [150/501/5622] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/BugDriver.cpp.o
588.757 [150/500/5623] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/ICF.cpp.o
588.775 [150/499/5624] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAliasAnalysis.cpp.o
588.780 [150/498/5625] Building CXX object unittests/MC/CMakeFiles/MCTests.dir/MCInstPrinter.cpp.o
588.814 [150/497/5626] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInstructions.cpp.o
588.987 [150/496/5627] Building CXX object tools/llvm-objdump/CMakeFiles/llvm-objdump.dir/XCOFFDump.cpp.o
589.004 [150/495/5628] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInvokes.cpp.o
589.245 [150/494/5629] Building CXX object unittests/CGData/CMakeFiles/CGDataTests.dir/OutlinedHashTreeRecordTest.cpp.o
589.273 [150/493/5630] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
ccache /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/install/stage1/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
589.281 [150/492/5631] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUELFObjectWriter.cpp.o
589.360 [150/491/5632] Building CXX object tools/lld/ELF/CMakeFiles/lldELF.dir/DriverUtils.cpp.o
589.605 [150/490/5633] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/ExecutionDriver.cpp.o
589.745 [150/489/5634] Building CXX object unittests/MC/AMDGPU/CMakeFiles/AMDGPUMCTests.dir/DwarfRegMappings.cpp.o
589.875 [150/488/5635] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibFunc.cpp.o
589.877 [150/487/5636] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/CodeRegionGenerator.cpp.o
589.922 [150/486/5637] Building C object tools/clang/tools/clang-fuzzer/dictionary/CMakeFiles/clang-fuzzer-dictionary.dir/dictionary.c.o
590.154 [150/485/5638] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/CallGraphSort.cpp.o
590.167 [150/484/5639] Building CXX object tools/clang/tools/clang-offload-packager/CMakeFiles/clang-offload-packager.dir/ClangOffloadPackager.cpp.o
590.377 [150/483/5640] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/BottleneckAnalysis.cpp.o
590.476 [150/482/5641] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceAttributes.cpp.o
590.721 [150/481/5642] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInstructionFlags.cpp.o
590.863 [150/480/5643] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/LowLevelTypeTest.cpp.o
590.874 [150/479/5644] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/ICF.cpp.o
590.906 [150/478/5645] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/OutputSegment.cpp.o
591.052 [150/477/5646] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/AMDGPUMetadataTest.cpp.o
591.055 [150/476/5647] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceDistinctMetadata.cpp.o
591.206 [150/475/5648] Building CXX object unittests/CodeGen/GlobalISel/CMakeFiles/GlobalISelTests.dir/GIMatchTableExecutorTest.cpp.o
591.353 [150/474/5649] Building CXX object tools/lld/ELF/CMakeFiles/lldELF.dir/Arch/X86.cpp.o
591.374 [150/473/5650] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
591.377 [150/472/5651] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceBasicBlocks.cpp.o
591.385 [150/471/5652] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/InputSection.cpp.o
591.455 [150/470/5653] Building CXX object unittests/DebugInfo/DWARF/CMakeFiles/DebugInfoDWARFTests.dir/DWARFListTableTest.cpp.o
591.457 [150/469/5654] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/ToolRunner.cpp.o
591.465 [150/468/5655] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1gen_reproducer_main.cpp.o
591.474 [150/467/5656] Building CXX object tools/dsymutil/CMakeFiles/dsymutil.dir/BinaryHolder.cpp.o
591.660 [150/466/5657] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceOperands.cpp.o
591.662 [150/465/5658] Building CXX object unittests/CodeGen/GlobalISel/CMakeFiles/GlobalISelTests.dir/CallLowering.cpp.o
591.665 [150/464/5659] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/llvm-reduce.cpp.o
591.697 [150/463/5660] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDKernelCodeTUtils.cpp.o

llvm-ci · 2024-09-30T12:24:33Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux running on sanitizer-buildbot2 while building clang,llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/4341

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[5187/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXIndexDataConsumer.cpp.o
[5188/5294] Building CXX object tools/clang/tools/clang-extdef-mapping/CMakeFiles/clang-extdef-mapping.dir/ClangExtDefMapGen.cpp.o
[5189/5294] Building CXX object tools/clang/tools/clang-scan-deps/CMakeFiles/clang-scan-deps.dir/ClangScanDeps.cpp.o
[5190/5294] Building CXX object tools/clang/tools/clang-check/CMakeFiles/clang-check.dir/ClangCheck.cpp.o
[5191/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexCodeCompletion.cpp.o
[5192/5294] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
[5193/5294] Building CXX object tools/clang/tools/c-index-test/CMakeFiles/c-index-test.dir/core_main.cpp.o
[5194/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5195/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/Indexing.cpp.o
[5196/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
[5197/5294] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUTargetStreamer.cpp.o
[5198/5294] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
[5199/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[5200/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[5201/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[5202/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[5203/5294] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCTargetDesc.cpp.o
[5204/5294] Building CXX object lib/Target/AMDGPU/AsmParser/CMakeFiles/LLVMAMDGPUAsmParser.dir/AMDGPUAsmParser.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

@@@STEP_FAILURE@@@
@@@BUILD_STEP test compiler-rt symbolizer@@@
ninja: Entering directory `build_default'
[1/40] Linking CXX static library lib/libLLVMAMDGPUUtils.a
[2/40] Linking CXX static library lib/libLLVMAMDGPUDesc.a
[3/40] Linking CXX static library lib/libLLVMAMDGPUDisassembler.a
[4/40] Linking CXX static library lib/libLLVMAMDGPUAsmParser.a
[5/40] Linking CXX executable bin/llvm-objdump
[6/40] Linking CXX executable bin/llvm-ar
[7/40] Linking CXX executable bin/sancov
[8/40] Generating ../../bin/llvm-ranlib
[9/40] Linking CXX executable bin/llvm-nm
[10/40] Linking CXX executable bin/llvm-jitlink
[11/40] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
Step 8 (build compiler-rt symbolizer) failure: build compiler-rt symbolizer (failure)
...
[5187/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXIndexDataConsumer.cpp.o
[5188/5294] Building CXX object tools/clang/tools/clang-extdef-mapping/CMakeFiles/clang-extdef-mapping.dir/ClangExtDefMapGen.cpp.o
[5189/5294] Building CXX object tools/clang/tools/clang-scan-deps/CMakeFiles/clang-scan-deps.dir/ClangScanDeps.cpp.o
[5190/5294] Building CXX object tools/clang/tools/clang-check/CMakeFiles/clang-check.dir/ClangCheck.cpp.o
[5191/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexCodeCompletion.cpp.o
[5192/5294] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
[5193/5294] Building CXX object tools/clang/tools/c-index-test/CMakeFiles/c-index-test.dir/core_main.cpp.o
[5194/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5195/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/Indexing.cpp.o
[5196/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
[5197/5294] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUTargetStreamer.cpp.o
[5198/5294] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
[5199/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[5200/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[5201/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[5202/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[5203/5294] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCTargetDesc.cpp.o
[5204/5294] Building CXX object lib/Target/AMDGPU/AsmParser/CMakeFiles/LLVMAMDGPUAsmParser.dir/AMDGPUAsmParser.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 9 (test compiler-rt symbolizer) failure: test compiler-rt symbolizer (failure)
...
[2/40] Linking CXX static library lib/libLLVMAMDGPUDesc.a
[3/40] Linking CXX static library lib/libLLVMAMDGPUDisassembler.a
[4/40] Linking CXX static library lib/libLLVMAMDGPUAsmParser.a
[5/40] Linking CXX executable bin/llvm-objdump
[6/40] Linking CXX executable bin/llvm-ar
[7/40] Linking CXX executable bin/sancov
[8/40] Generating ../../bin/llvm-ranlib
[9/40] Linking CXX executable bin/llvm-nm
[10/40] Linking CXX executable bin/llvm-jitlink
[11/40] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 10 (build compiler-rt debug) failure: build compiler-rt debug (failure)
...
[5218/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXSourceLocation.cpp.o
[5219/5294] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
[5220/5294] Building CXX object tools/clang/tools/c-index-test/CMakeFiles/c-index-test.dir/core_main.cpp.o
[5221/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexInclusionStack.cpp.o
[5222/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexUSRs.cpp.o
[5223/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/Indexing.cpp.o
[5224/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndex.cpp.o
[5225/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5226/5294] Building CXX object tools/clang/tools/clang-scan-deps/CMakeFiles/clang-scan-deps.dir/ClangScanDeps.cpp.o
[5227/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure)
@@@BUILD_STEP test compiler-rt debug@@@
ninja: Entering directory `build_default'
[1/30] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 12 (build compiler-rt tsan_debug) failure: build compiler-rt tsan_debug (failure)
...
[5199/5275] Generating ../../bin/llvm-ranlib
[5200/5275] Linking CXX executable bin/llvm-rtdyld
[5201/5275] Linking CXX executable bin/sancov
[5202/5275] Linking CXX executable bin/llvm-objdump
[5203/5275] Generating ../../bin/llvm-otool
[5204/5275] Linking CXX executable bin/llvm-nm
[5205/5275] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5206/5275] Linking CXX executable bin/llvm-jitlink
[5207/5275] Linking CXX executable bin/llvm-profgen
[5208/5275] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 13 (build compiler-rt default) failure: build compiler-rt default (failure)
...
[5218/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexHigh.cpp.o
[5219/5294] Building CXX object tools/clang/tools/clang-repl/CMakeFiles/clang-repl.dir/ClangRepl.cpp.o
[5220/5294] Building CXX object tools/clang/tools/c-index-test/CMakeFiles/c-index-test.dir/core_main.cpp.o
[5221/5294] Building CXX object tools/clang/tools/clang-scan-deps/CMakeFiles/clang-scan-deps.dir/ClangScanDeps.cpp.o
[5222/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndex.cpp.o
[5223/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXCursor.cpp.o
[5224/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXIndexDataConsumer.cpp.o
[5225/5294] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
[5226/5294] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5227/5294] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 14 (test compiler-rt default) failure: test compiler-rt default (failure)
@@@BUILD_STEP test compiler-rt default@@@
ninja: Entering directory `build_default'
[1/30] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-x86_64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-x86_64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-x86_64-linux/build/build_default/include -I/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 15 (build standalone compiler-rt) failure: build standalone compiler-rt (failure)
...
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.
Call Stack (most recent call first):
  CMakeLists.txt:12 (include)
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
-- The ASM compiler identification is unknown
-- Didn't find assembler
CMake Error at CMakeLists.txt:17 (project):
  The CMAKE_C_COMPILER:

    /home/b/sanitizer-x86_64-linux/build/build_default/bin/clang

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
  the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:17 (project):
  The CMAKE_CXX_COMPILER:

    /home/b/sanitizer-x86_64-linux/build/build_default/bin/clang++

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:17 (project):
  No CMAKE_ASM_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "ASM" or the CMake cache entry CMAKE_ASM_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.
-- Warning: Did not find file Compiler/-ASM
-- Configuring incomplete, errors occurred!

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
ninja: Entering directory `compiler_rt_build'
ninja: error: loading 'build.ninja': No such file or directory

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

llvm-ci · 2024-09-30T15:51:37Z

LLVM Buildbot has detected a new failure on builder clang-ppc64le-rhel running on ppc64le-clang-rhel-test while building clang,llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/145/builds/2167

Here is the relevant piece of the build log for the reference

Step 5 (build-unified-tree) failure: build (failure)
...
424.444 [204/139/5919] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/BottleneckAnalysis.cpp.o
424.489 [204/138/5920] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceGlobalVarInitializers.cpp.o
424.523 [204/137/5921] Building CXX object tools/llvm-ml/CMakeFiles/llvm-ml.dir/Disassembler.cpp.o
424.532 [204/136/5922] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/Symbols.cpp.o
424.545 [204/135/5923] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/SimplifyInstructions.cpp.o
424.596 [204/134/5924] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/ToolRunner.cpp.o
424.618 [204/133/5925] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUOpenCLEnqueuedBlockLowering.cpp.o
424.792 [204/132/5926] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/Relocations.cpp.o
424.923 [204/131/5927] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/COFFLinkerContext.cpp.o
424.967 [204/130/5928] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o 
ccache /home/docker/llvm-external-buildbots/clang.17.0.6/bin/clang++ --gcc-toolchain=/gcc-toolchain/usr -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fPIC  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCResourceInfo.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:26:16: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   26 |   auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
      |                ^~~~~
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp:64:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   64 |   auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
      |                           ^~~~~
2 errors generated.
425.060 [204/129/5929] Building CXX object tools/llvm-objdump/CMakeFiles/llvm-objdump.dir/OffloadDump.cpp.o
425.073 [204/128/5930] Building CXX object tools/llvm-objdump/CMakeFiles/llvm-objdump.dir/XCOFFDump.cpp.o
425.201 [204/127/5931] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/BugDriver.cpp.o
425.404 [204/126/5932] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexer.cpp.o
425.460 [204/125/5933] Building CXX object tools/clang/tools/clang-offload-packager/CMakeFiles/clang-offload-packager.dir/ClangOffloadPackager.cpp.o
425.471 [204/124/5934] Building CXX object tools/lld/MinGW/CMakeFiles/lldMinGW.dir/Driver.cpp.o
425.552 [204/123/5935] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURewriteUndefForPHI.cpp.o
425.672 [204/122/5936] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAliasAnalysis.cpp.o
425.835 [204/121/5937] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/MarkLive.cpp.o
425.869 [204/120/5938] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/CodeRegionGenerator.cpp.o
425.916 [204/119/5939] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceGlobalVars.cpp.o
425.936 [204/118/5940] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/ExecutionDriver.cpp.o
426.050 [204/117/5941] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/Arch/ARM64.cpp.o
426.153 [204/116/5942] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibFunc.cpp.o
426.200 [204/115/5943] Building CXX object tools/dsymutil/CMakeFiles/dsymutil.dir/BinaryHolder.cpp.o
426.241 [204/114/5944] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInstructions.cpp.o
426.336 [204/113/5945] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/MinGW.cpp.o
426.352 [204/112/5946] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/ICF.cpp.o
426.434 [204/111/5947] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCtorDtorLowering.cpp.o
426.446 [204/110/5948] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/CallGraphSort.cpp.o
426.681 [204/109/5949] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/LLDMapFile.cpp.o
426.782 [204/108/5950] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInstructionFlags.cpp.o
426.845 [204/107/5951] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUPALMetadata.cpp.o
427.074 [204/106/5952] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteKernelArguments.cpp.o
427.281 [204/105/5953] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/MapFile.cpp.o
427.283 [204/104/5954] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceInvokes.cpp.o
427.368 [204/103/5955] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/ICF.cpp.o
427.412 [204/102/5956] Building CXX object tools/llvm-reduce/CMakeFiles/llvm-reduce.dir/deltas/ReduceDistinctMetadata.cpp.o
427.610 [204/101/5957] Building CXX object tools/bugpoint/CMakeFiles/bugpoint.dir/OptimizerDriver.cpp.o
427.621 [204/100/5958] Building CXX object tools/llvm-objdump/CMakeFiles/llvm-objdump.dir/COFFDump.cpp.o

llvm-ci · 2024-09-30T17:07:24Z

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building clang,llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/1275

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/35/87' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-19660-35-87.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=87 GTEST_SHARD_INDEX=35 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************

…ass (llvm#102913) Reland: Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for. Fixes llvm#64863 Change-Id: I180c941a1535be646144960ff62e4cf24a5aa1da

…ass (llvm#102913) Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for. Fixes llvm#64863 [AMDGPU] Fix stack size metadata for functions with direct and indirect calls (llvm#110828) When a function has an external call, it should still use the stack sizes of direct, known, calls to calculate its own stack size [AMDGPU] Fix resource usage information for unnamed functions (llvm#115320) Resource usage information would try to overwrite unnamed functions if there are multiple within the same compilation unit. This aims to either use the `MCSymbol` assigned to the unnamed function (i.e., `CurrentFnSym`), or, rematerialize the `MCSymbol` for the unnamed function. Reapply [AMDGPU] Avoid resource propagation for recursion through multiple functions (llvm#112251) I was wrong last patch. I viewed the `Visited` set purely as a possible recursion deterrent where functions calling a callee multiple times are handled elsewhere. This doesn't consider cases where a function is called multiple times by different callers still part of the same call graph. New test shows the aforementioned case. Reapplies llvm#111004, fixes llvm#115562. [AMDGPU] Newly added test modified for recent SGPR use change (llvm#116427) Mistimed rebase for llvm#112251 which added new tests which did not consider the changes introduced in llvm#112403 yet Change-Id: I4dfe6a1b679137e080a6d2b44016347ea704b014

JanekvO requested review from jayfoad, arsenm, Pierre-vh and slinder1 August 12, 2024 14:44

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU mc Machine (object) code llvm:globalisel labels Aug 12, 2024

arsenm mentioned this pull request Aug 13, 2024

[AMDGPU] Backend code generation fails on internal function at O0 #64863

Closed

JanekvO force-pushed the resource-pass-convert branch from 422a81b to 5dff9e2 Compare August 15, 2024 13:21

arsenm reviewed Aug 15, 2024

View reviewed changes

slinder1 reviewed Aug 16, 2024

View reviewed changes

arsenm reviewed Aug 19, 2024

View reviewed changes

arsenm reviewed Aug 20, 2024

View reviewed changes

clang/test/Frontend/amdgcn-machine-analysis-remarks.cl Outdated Show resolved Hide resolved

arsenm reviewed Aug 20, 2024

View reviewed changes

arsenm reviewed Aug 21, 2024

View reviewed changes

JanekvO requested review from arsenm and slinder1 August 27, 2024 19:05

JanekvO commented Aug 29, 2024

View reviewed changes

arsenm reviewed Sep 15, 2024

View reviewed changes

arsenm reviewed Sep 18, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp Outdated Show resolved Hide resolved

arsenm reviewed Sep 25, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp Outdated Show resolved Hide resolved

arsenm approved these changes Sep 25, 2024

View reviewed changes

JanekvO force-pushed the resource-pass-convert branch from 65b98a8 to 9aa2721 Compare September 25, 2024 16:56

JanekvO and others added 13 commits September 26, 2024 16:44

Convert AMDGPUResourceUsageAnalysis pass from Module to MachineFuncti…

56f6606

…on pass and move metadata propagation logic to MC layer

Formatting

e655e35

Feedback

4f3b5c0

Feedback, capitalizations, move ResourceUsage initialization prior to…

eabe70d

… amdhsa kernel descriptor emit, remove duplicate validation for stack size

Remove AMDGPUMCResourceInfo MCContext class member

bd248dd

Restore clang test, add newline

f962fa6

Restore gfx11-user-sgpr-init16-bug test

50c2de4

Rename comments, remarks, and emitted symbol for sgprs to be a bit mo…

7e42417

…re verbose in what's emitted

Feedback

fbfc2b7

Feedback: Documentation and rematerialization of MFI through MachineF…

7acda3e

…unction

Feedback, rename max_num_Xgpr symbol, rephrasing doc

d5fb4f4

Feedback, rename amdgcn prefix to amdgpu

f4a340b

Comment rename for new test

8cfbc1b

JanekvO force-pushed the resource-pass-convert branch from 9aa2721 to 8cfbc1b Compare September 26, 2024 15:56

JanekvO merged commit c897c13 into llvm:main Sep 30, 2024
9 checks passed

jhuber6 mentioned this pull request Oct 2, 2024

[AMDGPU] Infinite recursion in evaluateAsRelocatableImpl when evaluating a cyclical SCC #110863

Closed

		@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {

		AMDGPUResourceUsageAnalysis *ResourceUsage;

		std::unique_ptr<MCResourceInfo> RI;

		@@ -3,6 +3,18 @@

		declare i32 @llvm.amdgcn.workitem.id.x()

		define <2 x i64> @f1() #0 {

		@@ -0,0 +1,533 @@
		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

		; SGPR use may not seem equal to the sgpr use provided in comments as the latter includes extra sgprs (e.g., for vcc use).

[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass #102913

[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass #102913

Uh oh!

Conversation

JanekvO commented Aug 12, 2024 • edited by jhuber6 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 12, 2024

Uh oh!

llvmbot commented Aug 12, 2024

Uh oh!

github-actions bot commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JanekvO commented Aug 12, 2024

Uh oh!

slinder1 commented Aug 14, 2024

Uh oh!

JanekvO commented Aug 15, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhuber6 commented Aug 16, 2024

Uh oh!

slinder1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanekvO Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JanekvO commented Aug 12, 2024 •

edited by jhuber6

Loading

llvmbot commented Aug 12, 2024 •

edited

Loading

github-actions bot commented Aug 12, 2024 •

edited

Loading

JanekvO Aug 19, 2024 •

edited

Loading