-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass #102913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-clang Author: Janek van Oirschot (JanekvO) Changes!!! Stacked PR on top of #95951 commit, please only review the latest commit 51f72f115b340a092c2c9f8569911b944a4efb6d !!!! Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.
Patch is 369.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102913.diff 66 Files Affected:
diff --git a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
index a05e21b37b9127..a2dd59a871904c 100644
--- a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
+++ b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
@@ -2,12 +2,12 @@
// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx908 -Rpass-analysis=kernel-resource-usage -S -O0 -verify %s -o /dev/null
// expected-remark@+10 {{Function Name: foo}}
-// expected-remark@+9 {{ SGPRs: 13}}
-// expected-remark@+8 {{ VGPRs: 10}}
-// expected-remark@+7 {{ AGPRs: 12}}
-// expected-remark@+6 {{ ScratchSize [bytes/lane]: 0}}
+// expected-remark@+9 {{ SGPRs: foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1))}}
+// expected-remark@+8 {{ VGPRs: foo.num_vgpr}}
+// expected-remark@+7 {{ AGPRs: foo.num_agpr}}
+// expected-remark@+6 {{ ScratchSize [bytes/lane]: foo.private_seg_size}}
// expected-remark@+5 {{ Dynamic Stack: False}}
-// expected-remark@+4 {{ Occupancy [waves/SIMD]: 10}}
+// expected-remark@+4 {{ Occupancy [waves/SIMD]: occupancy(10, 4, 256, 8, 10, max(foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1)), 1, 0), max(totalnumvgprs(foo.num_agpr, foo.num_vgpr), 1, 0))}}
// expected-remark@+3 {{ SGPRs Spill: 0}}
// expected-remark@+2 {{ VGPRs Spill: 0}}
// expected-remark@+1 {{ LDS Size [bytes/block]: 0}}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index e64e28e01d3d18..97a5cb29d51023 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -18,6 +18,7 @@
#include "AMDGPUAsmPrinter.h"
#include "AMDGPU.h"
#include "AMDGPUHSAMetadataStreamer.h"
+#include "AMDGPUMCResourceInfo.h"
#include "AMDGPUResourceUsageAnalysis.h"
#include "GCNSubtarget.h"
#include "MCTargetDesc/AMDGPUInstPrinter.h"
@@ -92,6 +93,9 @@ AMDGPUAsmPrinter::AMDGPUAsmPrinter(TargetMachine &TM,
std::unique_ptr<MCStreamer> Streamer)
: AsmPrinter(TM, std::move(Streamer)) {
assert(OutStreamer && "AsmPrinter constructed without streamer");
+ RI = std::make_unique<MCResourceInfo>(OutContext);
+ OccupancyValidateMap =
+ std::make_unique<DenseMap<const Function *, const MCExpr *>>();
}
StringRef AMDGPUAsmPrinter::getPassName() const {
@@ -359,6 +363,102 @@ bool AMDGPUAsmPrinter::doInitialization(Module &M) {
return AsmPrinter::doInitialization(M);
}
+void AMDGPUAsmPrinter::ValidateMCResourceInfo(Function &F) {
+ if (F.isDeclaration() || !AMDGPU::isModuleEntryFunctionCC(F.getCallingConv()))
+ return;
+
+ using RIK = MCResourceInfo::ResourceInfoKind;
+ const GCNSubtarget &STM = TM.getSubtarget<GCNSubtarget>(F);
+
+ auto TryGetMCExprValue = [](const MCExpr *Value, uint64_t &Res) -> bool {
+ int64_t Val;
+ if (Value->evaluateAsAbsolute(Val)) {
+ Res = Val;
+ return true;
+ }
+ return false;
+ };
+
+ const uint64_t MaxScratchPerWorkitem =
+ STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
+ MCSymbol *ScratchSizeSymbol =
+ RI->getSymbol(F.getName(), RIK::RIK_PrivateSegSize);
+ uint64_t ScratchSize;
+ if (ScratchSizeSymbol->isVariable() &&
+ TryGetMCExprValue(ScratchSizeSymbol->getVariableValue(), ScratchSize) &&
+ ScratchSize > MaxScratchPerWorkitem) {
+ DiagnosticInfoStackSize DiagStackSize(F, ScratchSize, MaxScratchPerWorkitem,
+ DS_Error);
+ F.getContext().diagnose(DiagStackSize);
+ }
+
+ // Validate addressable scalar registers (i.e., prior to added implicit
+ // SGPRs).
+ MCSymbol *NumSGPRSymbol = RI->getSymbol(F.getName(), RIK::RIK_NumSGPR);
+ if (STM.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS &&
+ !STM.hasSGPRInitBug()) {
+ unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+ uint64_t NumSgpr;
+ if (NumSGPRSymbol->isVariable() &&
+ TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+ NumSgpr > MaxAddressableNumSGPRs) {
+ DiagnosticInfoResourceLimit Diag(F, "addressable scalar registers",
+ NumSgpr, MaxAddressableNumSGPRs,
+ DS_Error, DK_ResourceLimit);
+ F.getContext().diagnose(Diag);
+ return;
+ }
+ }
+
+ MCSymbol *VCCUsedSymbol = RI->getSymbol(F.getName(), RIK::RIK_UsesVCC);
+ MCSymbol *FlatUsedSymbol =
+ RI->getSymbol(F.getName(), RIK::RIK_UsesFlatScratch);
+ uint64_t VCCUsed, FlatUsed, NumSgpr;
+
+ if (NumSGPRSymbol->isVariable() && VCCUsedSymbol->isVariable() &&
+ FlatUsedSymbol->isVariable() &&
+ TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+ TryGetMCExprValue(VCCUsedSymbol->getVariableValue(), VCCUsed) &&
+ TryGetMCExprValue(FlatUsedSymbol->getVariableValue(), FlatUsed)) {
+
+ // Recomputes NumSgprs + implicit SGPRs but all symbols should now be
+ // resolvable.
+ NumSgpr += IsaInfo::getNumExtraSGPRs(
+ &STM, VCCUsed, FlatUsed,
+ getTargetStreamer()->getTargetID()->isXnackOnOrAny());
+ if (STM.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ||
+ STM.hasSGPRInitBug()) {
+ unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+ if (NumSgpr > MaxAddressableNumSGPRs) {
+ DiagnosticInfoResourceLimit Diag(F, "scalar registers", NumSgpr,
+ MaxAddressableNumSGPRs, DS_Error,
+ DK_ResourceLimit);
+ F.getContext().diagnose(Diag);
+ return;
+ }
+ }
+
+ auto I = OccupancyValidateMap->find(&F);
+ if (I != OccupancyValidateMap->end()) {
+ const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+ F, "amdgpu-waves-per-eu", {0, 0}, true);
+ uint64_t Occupancy;
+ const MCExpr *OccupancyExpr = I->getSecond();
+
+ if (TryGetMCExprValue(OccupancyExpr, Occupancy) && Occupancy < MinWEU) {
+ DiagnosticInfoOptimizationFailure Diag(
+ F, F.getSubprogram(),
+ "failed to meet occupancy target given by 'amdgpu-waves-per-eu' in "
+ "'" +
+ F.getName() + "': desired occupancy was " + Twine(MinWEU) +
+ ", final occupancy is " + Twine(Occupancy));
+ F.getContext().diagnose(Diag);
+ return;
+ }
+ }
+ }
+}
+
bool AMDGPUAsmPrinter::doFinalization(Module &M) {
// Pad with s_code_end to help tools and guard against instruction prefetch
// causing stale data in caches. Arguably this should be done by the linker,
@@ -371,39 +471,29 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
getTargetStreamer()->EmitCodeEnd(STI);
}
- return AsmPrinter::doFinalization(M);
-}
+ // Assign expressions which can only be resolved when all other functions are
+ // known.
+ RI->Finalize();
+ getTargetStreamer()->EmitMCResourceMaximums(
+ RI->getMaxVGPRSymbol(), RI->getMaxAGPRSymbol(), RI->getMaxSGPRSymbol());
-// Print comments that apply to both callable functions and entry points.
-void AMDGPUAsmPrinter::emitCommonFunctionComments(
- uint32_t NumVGPR, std::optional<uint32_t> NumAGPR, uint32_t TotalNumVGPR,
- uint32_t NumSGPR, uint64_t ScratchSize, uint64_t CodeSize,
- const AMDGPUMachineFunction *MFI) {
- OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);
- OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);
- OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);
- if (NumAGPR) {
- OutStreamer->emitRawComment(" NumAgprs: " + Twine(*NumAGPR), false);
- OutStreamer->emitRawComment(" TotalNumVgprs: " + Twine(TotalNumVGPR),
- false);
+ for (Function &F : M.functions()) {
+ ValidateMCResourceInfo(F);
}
- OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);
- OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),
- false);
+ return AsmPrinter::doFinalization(M);
}
SmallString<128> AMDGPUAsmPrinter::getMCExprStr(const MCExpr *Value) {
SmallString<128> Str;
raw_svector_ostream OSS(Str);
- int64_t IVal;
- if (Value->evaluateAsAbsolute(IVal)) {
- OSS << static_cast<uint64_t>(IVal);
- } else {
- Value->print(OSS, MAI);
- }
+ auto &Streamer = getTargetStreamer()->getStreamer();
+ auto &Context = Streamer.getContext();
+ const MCExpr *New = llvm::TryFold(Value, Context);
+ AMDGPUMCExprPrint(New, OSS, MAI);
return Str;
}
+// Print comments that apply to both callable functions and entry points.
void AMDGPUAsmPrinter::emitCommonFunctionComments(
const MCExpr *NumVGPR, const MCExpr *NumAGPR, const MCExpr *TotalNumVGPR,
const MCExpr *NumSGPR, const MCExpr *ScratchSize, uint64_t CodeSize,
@@ -573,21 +663,45 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
emitResourceUsageRemarks(MF, CurrentProgramInfo, MFI->isModuleEntryFunction(),
STM.hasMAIInsts());
+ {
+ const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+ ResourceUsage->getResourceInfo();
+ RI->gatherResourceInfo(MF, Info);
+ using RIK = MCResourceInfo::ResourceInfoKind;
+ getTargetStreamer()->EmitMCResourceInfo(
+ RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR),
+ RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR),
+ RI->getSymbol(MF.getName(), RIK::RIK_NumSGPR),
+ RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize),
+ RI->getSymbol(MF.getName(), RIK::RIK_UsesVCC),
+ RI->getSymbol(MF.getName(), RIK::RIK_UsesFlatScratch),
+ RI->getSymbol(MF.getName(), RIK::RIK_HasDynSizedStack),
+ RI->getSymbol(MF.getName(), RIK::RIK_HasRecursion),
+ RI->getSymbol(MF.getName(), RIK::RIK_HasIndirectCall));
+ }
+
if (isVerbose()) {
MCSectionELF *CommentSection =
Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);
OutStreamer->switchSection(CommentSection);
if (!MFI->isEntryFunction()) {
+ using RIK = MCResourceInfo::ResourceInfoKind;
OutStreamer->emitRawComment(" Function info:", false);
- const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
- ResourceUsage->getResourceInfo(&MF.getFunction());
+
emitCommonFunctionComments(
- Info.NumVGPR,
- STM.hasMAIInsts() ? Info.NumAGPR : std::optional<uint32_t>(),
- Info.getTotalNumVGPRs(STM),
- Info.getTotalNumSGPRs(MF.getSubtarget<GCNSubtarget>()),
- Info.PrivateSegmentSize, getFunctionCodeSize(MF), MFI);
+ RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR)->getVariableValue(),
+ STM.hasMAIInsts() ? RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR)
+ ->getVariableValue()
+ : nullptr,
+ RI->createTotalNumVGPRs(MF, Ctx),
+ RI->createTotalNumSGPRs(
+ MF,
+ MF.getSubtarget<GCNSubtarget>().getTargetID().isXnackOnOrAny(),
+ Ctx),
+ RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize)
+ ->getVariableValue(),
+ getFunctionCodeSize(MF), MFI);
return false;
}
@@ -755,8 +869,6 @@ uint64_t AMDGPUAsmPrinter::getFunctionCodeSize(const MachineFunction &MF) const
void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
const MachineFunction &MF) {
- const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
- ResourceUsage->getResourceInfo(&MF.getFunction());
const GCNSubtarget &STM = MF.getSubtarget<GCNSubtarget>();
MCContext &Ctx = MF.getContext();
@@ -773,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
return false;
};
- ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR);
- ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR);
- ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM));
- ProgInfo.AccumOffset =
- CreateExpr(alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1);
+ auto GetSymRefExpr =
+ [&](MCResourceInfo::ResourceInfoKind RIK) -> const MCExpr * {
+ MCSymbol *Sym = RI->getSymbol(MF.getName(), RIK);
+ return MCSymbolRefExpr::create(Sym, Ctx);
+ };
+
+ const MCExpr *ConstFour = MCConstantExpr::create(4, Ctx);
+ const MCExpr *ConstOne = MCConstantExpr::create(1, Ctx);
+
+ using RIK = MCResourceInfo::ResourceInfoKind;
+ ProgInfo.NumArchVGPR = GetSymRefExpr(RIK::RIK_NumVGPR);
+ ProgInfo.NumAccVGPR = GetSymRefExpr(RIK::RIK_NumAGPR);
+ ProgInfo.NumVGPR = AMDGPUMCExpr::createTotalNumVGPR(
+ ProgInfo.NumAccVGPR, ProgInfo.NumArchVGPR, Ctx);
+
+ // AccumOffset computed for the MCExpr equivalent of:
+ // alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1;
+ ProgInfo.AccumOffset = MCBinaryExpr::createSub(
+ MCBinaryExpr::createDiv(
+ AMDGPUMCExpr::createAlignTo(
+ AMDGPUMCExpr::createMax({ConstOne, ProgInfo.NumArchVGPR}, Ctx),
+ ConstFour, Ctx),
+ ConstFour, Ctx),
+ ConstOne, Ctx);
ProgInfo.TgSplit = STM.isTgSplitEnabled();
- ProgInfo.NumSGPR = CreateExpr(Info.NumExplicitSGPR);
- ProgInfo.ScratchSize = CreateExpr(Info.PrivateSegmentSize);
- ProgInfo.VCCUsed = CreateExpr(Info.UsesVCC);
- ProgInfo.FlatUsed = CreateExpr(Info.UsesFlatScratch);
+ ProgInfo.NumSGPR = GetSymRefExpr(RIK::RIK_NumSGPR);
+ ProgInfo.ScratchSize = GetSymRefExpr(RIK::RIK_PrivateSegSize);
+ ProgInfo.VCCUsed = GetSymRefExpr(RIK::RIK_UsesVCC);
+ ProgInfo.FlatUsed = GetSymRefExpr(RIK::RIK_UsesFlatScratch);
ProgInfo.DynamicCallStack =
- CreateExpr(Info.HasDynamicallySizedStack || Info.HasRecursion);
+ MCBinaryExpr::createOr(GetSymRefExpr(RIK::RIK_HasDynSizedStack),
+ GetSymRefExpr(RIK::RIK_HasRecursion), Ctx);
const uint64_t MaxScratchPerWorkitem =
STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
@@ -1084,6 +1216,8 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
STM.computeOccupancy(F, ProgInfo.LDSSize), ProgInfo.NumSGPRsForWavesPerEU,
ProgInfo.NumVGPRsForWavesPerEU, STM, Ctx);
+ OccupancyValidateMap->insert({&MF.getFunction(), ProgInfo.Occupancy});
+
const auto [MinWEU, MaxWEU] =
AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", {0, 0}, true);
uint64_t Occupancy;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index f66bbde42ce278..676a4687ee2af7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -24,6 +24,7 @@ struct AMDGPUResourceUsageAnalysis;
class AMDGPUTargetStreamer;
class MCCodeEmitter;
class MCOperand;
+class MCResourceInfo;
namespace AMDGPU {
struct MCKernelDescriptor;
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
AMDGPUResourceUsageAnalysis *ResourceUsage;
+ std::unique_ptr<MCResourceInfo> RI;
+
SIProgramInfo CurrentProgramInfo;
std::unique_ptr<AMDGPU::HSAMD::MetadataStreamer> HSAMetadataStream;
MCCodeEmitter *DumpCodeInstEmitter = nullptr;
+ // ValidateMCResourceInfo cannot recompute parts of the occupancy as it does
+ // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
+ // really want to track and validate the occupancy.
+ std::unique_ptr<DenseMap<const Function *, const MCExpr *>>
+ OccupancyValidateMap;
+
uint64_t getFunctionCodeSize(const MachineFunction &MF) const;
void getSIProgramInfo(SIProgramInfo &Out, const MachineFunction &MF);
@@ -60,11 +69,6 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
void EmitPALMetadata(const MachineFunction &MF,
const SIProgramInfo &KernelInfo);
void emitPALFunctionMetadata(const MachineFunction &MF);
- void emitCommonFunctionComments(uint32_t NumVGPR,
- std::optional<uint32_t> NumAGPR,
- uint32_t TotalNumVGPR, uint32_t NumSGPR,
- uint64_t ScratchSize, uint64_t CodeSize,
- const AMDGPUMachineFunction *MFI);
void emitCommonFunctionComments(const MCExpr *NumVGPR, const MCExpr *NumAGPR,
const MCExpr *TotalNumVGPR,
const MCExpr *NumSGPR,
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
SmallString<128> getMCExprStr(const MCExpr *Value);
+ /// Attempts to replace the validation that is missed in getSIProgramInfo due
+ /// to MCExpr being unknown. Invoked during doFinalization such that the
+ /// MCResourceInfo symbols are known.
+ void ValidateMCResourceInfo(Function &F);
+
public:
explicit AMDGPUAsmPrinter(TargetMachine &TM,
std::unique_ptr<MCStreamer> Streamer);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
new file mode 100644
index 00000000000000..58383475b312c9
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info ----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
+ switch (RIK) {
+ case RIK_NumVGPR:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));
+ case RIK_NumAGPR:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".num_agpr"));
+ case RIK_NumSGPR:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".num_sgpr"));
+ case RIK_PrivateSegSize:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".private_seg_size"));
+ case RIK_UsesVCC:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_vcc"));
+ case RIK_UsesFlatScratch:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_flat_scratch"));
+ case RIK_HasDynSizedStack:
+ return OutContext.getOrCreateSymbol(FuncName +
+ Twine(".has_dyn_sized_stack"));
+ case RIK_HasRecursion:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".has_recursion"));
+ case RIK_HasIndirectCall:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".has_indirect_call"));
+ }
+ llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+ ResourceInfoKind RIK,
+ MCContext &Ctx) {
+ return MCSymbolRefExpr::create(getSymbol(FuncName, RIK), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs() {
+ // Assign expression to get the max register use to the max_num_Xgpr symbol.
+ MCSymbol *MaxVGPRSym = getMaxVGPRSymbol();
+ MCSymbol *MaxAGPRSym = getMaxAGPRSymbol();
+ MCSymbol *MaxSGPRSym = getMaxSGPRSymbol();
+
+ auto assignMaxRegSym = [this](MCSymbol *Sym, int32_t RegCount) {
+ const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+ Sym->setVariableValue(MaxExpr);
+ };
+
+ assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+ assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+ assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::Finalize() {
+ assert(!finalized && "Cannot finalize ResourceInfo again.");
+ finalized = true;
+ assignMaxRegs();
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol() {
+ return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol() {
+ return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol() {
+ return OutContext.getOrCreateSymbol("max_num_sgpr");
+}
+
+void MCResourceInfo::assignResourceInfoExpr(
+ ...
[truncated]
|
@llvm/pr-subscribers-mc Author: Janek van Oirschot (JanekvO) Changes!!! Stacked PR on top of #95951 commit, please only review the latest commit 51f72f115b340a092c2c9f8569911b944a4efb6d !!!! Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.
Patch is 369.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102913.diff 66 Files Affected:
diff --git a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
index a05e21b37b9127..a2dd59a871904c 100644
--- a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
+++ b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
@@ -2,12 +2,12 @@
// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx908 -Rpass-analysis=kernel-resource-usage -S -O0 -verify %s -o /dev/null
// expected-remark@+10 {{Function Name: foo}}
-// expected-remark@+9 {{ SGPRs: 13}}
-// expected-remark@+8 {{ VGPRs: 10}}
-// expected-remark@+7 {{ AGPRs: 12}}
-// expected-remark@+6 {{ ScratchSize [bytes/lane]: 0}}
+// expected-remark@+9 {{ SGPRs: foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1))}}
+// expected-remark@+8 {{ VGPRs: foo.num_vgpr}}
+// expected-remark@+7 {{ AGPRs: foo.num_agpr}}
+// expected-remark@+6 {{ ScratchSize [bytes/lane]: foo.private_seg_size}}
// expected-remark@+5 {{ Dynamic Stack: False}}
-// expected-remark@+4 {{ Occupancy [waves/SIMD]: 10}}
+// expected-remark@+4 {{ Occupancy [waves/SIMD]: occupancy(10, 4, 256, 8, 10, max(foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1)), 1, 0), max(totalnumvgprs(foo.num_agpr, foo.num_vgpr), 1, 0))}}
// expected-remark@+3 {{ SGPRs Spill: 0}}
// expected-remark@+2 {{ VGPRs Spill: 0}}
// expected-remark@+1 {{ LDS Size [bytes/block]: 0}}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index e64e28e01d3d18..97a5cb29d51023 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -18,6 +18,7 @@
#include "AMDGPUAsmPrinter.h"
#include "AMDGPU.h"
#include "AMDGPUHSAMetadataStreamer.h"
+#include "AMDGPUMCResourceInfo.h"
#include "AMDGPUResourceUsageAnalysis.h"
#include "GCNSubtarget.h"
#include "MCTargetDesc/AMDGPUInstPrinter.h"
@@ -92,6 +93,9 @@ AMDGPUAsmPrinter::AMDGPUAsmPrinter(TargetMachine &TM,
std::unique_ptr<MCStreamer> Streamer)
: AsmPrinter(TM, std::move(Streamer)) {
assert(OutStreamer && "AsmPrinter constructed without streamer");
+ RI = std::make_unique<MCResourceInfo>(OutContext);
+ OccupancyValidateMap =
+ std::make_unique<DenseMap<const Function *, const MCExpr *>>();
}
StringRef AMDGPUAsmPrinter::getPassName() const {
@@ -359,6 +363,102 @@ bool AMDGPUAsmPrinter::doInitialization(Module &M) {
return AsmPrinter::doInitialization(M);
}
+void AMDGPUAsmPrinter::ValidateMCResourceInfo(Function &F) {
+ if (F.isDeclaration() || !AMDGPU::isModuleEntryFunctionCC(F.getCallingConv()))
+ return;
+
+ using RIK = MCResourceInfo::ResourceInfoKind;
+ const GCNSubtarget &STM = TM.getSubtarget<GCNSubtarget>(F);
+
+ auto TryGetMCExprValue = [](const MCExpr *Value, uint64_t &Res) -> bool {
+ int64_t Val;
+ if (Value->evaluateAsAbsolute(Val)) {
+ Res = Val;
+ return true;
+ }
+ return false;
+ };
+
+ const uint64_t MaxScratchPerWorkitem =
+ STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
+ MCSymbol *ScratchSizeSymbol =
+ RI->getSymbol(F.getName(), RIK::RIK_PrivateSegSize);
+ uint64_t ScratchSize;
+ if (ScratchSizeSymbol->isVariable() &&
+ TryGetMCExprValue(ScratchSizeSymbol->getVariableValue(), ScratchSize) &&
+ ScratchSize > MaxScratchPerWorkitem) {
+ DiagnosticInfoStackSize DiagStackSize(F, ScratchSize, MaxScratchPerWorkitem,
+ DS_Error);
+ F.getContext().diagnose(DiagStackSize);
+ }
+
+ // Validate addressable scalar registers (i.e., prior to added implicit
+ // SGPRs).
+ MCSymbol *NumSGPRSymbol = RI->getSymbol(F.getName(), RIK::RIK_NumSGPR);
+ if (STM.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS &&
+ !STM.hasSGPRInitBug()) {
+ unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+ uint64_t NumSgpr;
+ if (NumSGPRSymbol->isVariable() &&
+ TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+ NumSgpr > MaxAddressableNumSGPRs) {
+ DiagnosticInfoResourceLimit Diag(F, "addressable scalar registers",
+ NumSgpr, MaxAddressableNumSGPRs,
+ DS_Error, DK_ResourceLimit);
+ F.getContext().diagnose(Diag);
+ return;
+ }
+ }
+
+ MCSymbol *VCCUsedSymbol = RI->getSymbol(F.getName(), RIK::RIK_UsesVCC);
+ MCSymbol *FlatUsedSymbol =
+ RI->getSymbol(F.getName(), RIK::RIK_UsesFlatScratch);
+ uint64_t VCCUsed, FlatUsed, NumSgpr;
+
+ if (NumSGPRSymbol->isVariable() && VCCUsedSymbol->isVariable() &&
+ FlatUsedSymbol->isVariable() &&
+ TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+ TryGetMCExprValue(VCCUsedSymbol->getVariableValue(), VCCUsed) &&
+ TryGetMCExprValue(FlatUsedSymbol->getVariableValue(), FlatUsed)) {
+
+ // Recomputes NumSgprs + implicit SGPRs but all symbols should now be
+ // resolvable.
+ NumSgpr += IsaInfo::getNumExtraSGPRs(
+ &STM, VCCUsed, FlatUsed,
+ getTargetStreamer()->getTargetID()->isXnackOnOrAny());
+ if (STM.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ||
+ STM.hasSGPRInitBug()) {
+ unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+ if (NumSgpr > MaxAddressableNumSGPRs) {
+ DiagnosticInfoResourceLimit Diag(F, "scalar registers", NumSgpr,
+ MaxAddressableNumSGPRs, DS_Error,
+ DK_ResourceLimit);
+ F.getContext().diagnose(Diag);
+ return;
+ }
+ }
+
+ auto I = OccupancyValidateMap->find(&F);
+ if (I != OccupancyValidateMap->end()) {
+ const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+ F, "amdgpu-waves-per-eu", {0, 0}, true);
+ uint64_t Occupancy;
+ const MCExpr *OccupancyExpr = I->getSecond();
+
+ if (TryGetMCExprValue(OccupancyExpr, Occupancy) && Occupancy < MinWEU) {
+ DiagnosticInfoOptimizationFailure Diag(
+ F, F.getSubprogram(),
+ "failed to meet occupancy target given by 'amdgpu-waves-per-eu' in "
+ "'" +
+ F.getName() + "': desired occupancy was " + Twine(MinWEU) +
+ ", final occupancy is " + Twine(Occupancy));
+ F.getContext().diagnose(Diag);
+ return;
+ }
+ }
+ }
+}
+
bool AMDGPUAsmPrinter::doFinalization(Module &M) {
// Pad with s_code_end to help tools and guard against instruction prefetch
// causing stale data in caches. Arguably this should be done by the linker,
@@ -371,39 +471,29 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
getTargetStreamer()->EmitCodeEnd(STI);
}
- return AsmPrinter::doFinalization(M);
-}
+ // Assign expressions which can only be resolved when all other functions are
+ // known.
+ RI->Finalize();
+ getTargetStreamer()->EmitMCResourceMaximums(
+ RI->getMaxVGPRSymbol(), RI->getMaxAGPRSymbol(), RI->getMaxSGPRSymbol());
-// Print comments that apply to both callable functions and entry points.
-void AMDGPUAsmPrinter::emitCommonFunctionComments(
- uint32_t NumVGPR, std::optional<uint32_t> NumAGPR, uint32_t TotalNumVGPR,
- uint32_t NumSGPR, uint64_t ScratchSize, uint64_t CodeSize,
- const AMDGPUMachineFunction *MFI) {
- OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);
- OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);
- OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);
- if (NumAGPR) {
- OutStreamer->emitRawComment(" NumAgprs: " + Twine(*NumAGPR), false);
- OutStreamer->emitRawComment(" TotalNumVgprs: " + Twine(TotalNumVGPR),
- false);
+ for (Function &F : M.functions()) {
+ ValidateMCResourceInfo(F);
}
- OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);
- OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),
- false);
+ return AsmPrinter::doFinalization(M);
}
SmallString<128> AMDGPUAsmPrinter::getMCExprStr(const MCExpr *Value) {
SmallString<128> Str;
raw_svector_ostream OSS(Str);
- int64_t IVal;
- if (Value->evaluateAsAbsolute(IVal)) {
- OSS << static_cast<uint64_t>(IVal);
- } else {
- Value->print(OSS, MAI);
- }
+ auto &Streamer = getTargetStreamer()->getStreamer();
+ auto &Context = Streamer.getContext();
+ const MCExpr *New = llvm::TryFold(Value, Context);
+ AMDGPUMCExprPrint(New, OSS, MAI);
return Str;
}
+// Print comments that apply to both callable functions and entry points.
void AMDGPUAsmPrinter::emitCommonFunctionComments(
const MCExpr *NumVGPR, const MCExpr *NumAGPR, const MCExpr *TotalNumVGPR,
const MCExpr *NumSGPR, const MCExpr *ScratchSize, uint64_t CodeSize,
@@ -573,21 +663,45 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
emitResourceUsageRemarks(MF, CurrentProgramInfo, MFI->isModuleEntryFunction(),
STM.hasMAIInsts());
+ {
+ const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+ ResourceUsage->getResourceInfo();
+ RI->gatherResourceInfo(MF, Info);
+ using RIK = MCResourceInfo::ResourceInfoKind;
+ getTargetStreamer()->EmitMCResourceInfo(
+ RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR),
+ RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR),
+ RI->getSymbol(MF.getName(), RIK::RIK_NumSGPR),
+ RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize),
+ RI->getSymbol(MF.getName(), RIK::RIK_UsesVCC),
+ RI->getSymbol(MF.getName(), RIK::RIK_UsesFlatScratch),
+ RI->getSymbol(MF.getName(), RIK::RIK_HasDynSizedStack),
+ RI->getSymbol(MF.getName(), RIK::RIK_HasRecursion),
+ RI->getSymbol(MF.getName(), RIK::RIK_HasIndirectCall));
+ }
+
if (isVerbose()) {
MCSectionELF *CommentSection =
Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);
OutStreamer->switchSection(CommentSection);
if (!MFI->isEntryFunction()) {
+ using RIK = MCResourceInfo::ResourceInfoKind;
OutStreamer->emitRawComment(" Function info:", false);
- const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
- ResourceUsage->getResourceInfo(&MF.getFunction());
+
emitCommonFunctionComments(
- Info.NumVGPR,
- STM.hasMAIInsts() ? Info.NumAGPR : std::optional<uint32_t>(),
- Info.getTotalNumVGPRs(STM),
- Info.getTotalNumSGPRs(MF.getSubtarget<GCNSubtarget>()),
- Info.PrivateSegmentSize, getFunctionCodeSize(MF), MFI);
+ RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR)->getVariableValue(),
+ STM.hasMAIInsts() ? RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR)
+ ->getVariableValue()
+ : nullptr,
+ RI->createTotalNumVGPRs(MF, Ctx),
+ RI->createTotalNumSGPRs(
+ MF,
+ MF.getSubtarget<GCNSubtarget>().getTargetID().isXnackOnOrAny(),
+ Ctx),
+ RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize)
+ ->getVariableValue(),
+ getFunctionCodeSize(MF), MFI);
return false;
}
@@ -755,8 +869,6 @@ uint64_t AMDGPUAsmPrinter::getFunctionCodeSize(const MachineFunction &MF) const
void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
const MachineFunction &MF) {
- const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
- ResourceUsage->getResourceInfo(&MF.getFunction());
const GCNSubtarget &STM = MF.getSubtarget<GCNSubtarget>();
MCContext &Ctx = MF.getContext();
@@ -773,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
return false;
};
- ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR);
- ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR);
- ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM));
- ProgInfo.AccumOffset =
- CreateExpr(alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1);
+ auto GetSymRefExpr =
+ [&](MCResourceInfo::ResourceInfoKind RIK) -> const MCExpr * {
+ MCSymbol *Sym = RI->getSymbol(MF.getName(), RIK);
+ return MCSymbolRefExpr::create(Sym, Ctx);
+ };
+
+ const MCExpr *ConstFour = MCConstantExpr::create(4, Ctx);
+ const MCExpr *ConstOne = MCConstantExpr::create(1, Ctx);
+
+ using RIK = MCResourceInfo::ResourceInfoKind;
+ ProgInfo.NumArchVGPR = GetSymRefExpr(RIK::RIK_NumVGPR);
+ ProgInfo.NumAccVGPR = GetSymRefExpr(RIK::RIK_NumAGPR);
+ ProgInfo.NumVGPR = AMDGPUMCExpr::createTotalNumVGPR(
+ ProgInfo.NumAccVGPR, ProgInfo.NumArchVGPR, Ctx);
+
+ // AccumOffset computed for the MCExpr equivalent of:
+ // alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1;
+ ProgInfo.AccumOffset = MCBinaryExpr::createSub(
+ MCBinaryExpr::createDiv(
+ AMDGPUMCExpr::createAlignTo(
+ AMDGPUMCExpr::createMax({ConstOne, ProgInfo.NumArchVGPR}, Ctx),
+ ConstFour, Ctx),
+ ConstFour, Ctx),
+ ConstOne, Ctx);
ProgInfo.TgSplit = STM.isTgSplitEnabled();
- ProgInfo.NumSGPR = CreateExpr(Info.NumExplicitSGPR);
- ProgInfo.ScratchSize = CreateExpr(Info.PrivateSegmentSize);
- ProgInfo.VCCUsed = CreateExpr(Info.UsesVCC);
- ProgInfo.FlatUsed = CreateExpr(Info.UsesFlatScratch);
+ ProgInfo.NumSGPR = GetSymRefExpr(RIK::RIK_NumSGPR);
+ ProgInfo.ScratchSize = GetSymRefExpr(RIK::RIK_PrivateSegSize);
+ ProgInfo.VCCUsed = GetSymRefExpr(RIK::RIK_UsesVCC);
+ ProgInfo.FlatUsed = GetSymRefExpr(RIK::RIK_UsesFlatScratch);
ProgInfo.DynamicCallStack =
- CreateExpr(Info.HasDynamicallySizedStack || Info.HasRecursion);
+ MCBinaryExpr::createOr(GetSymRefExpr(RIK::RIK_HasDynSizedStack),
+ GetSymRefExpr(RIK::RIK_HasRecursion), Ctx);
const uint64_t MaxScratchPerWorkitem =
STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
@@ -1084,6 +1216,8 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
STM.computeOccupancy(F, ProgInfo.LDSSize), ProgInfo.NumSGPRsForWavesPerEU,
ProgInfo.NumVGPRsForWavesPerEU, STM, Ctx);
+ OccupancyValidateMap->insert({&MF.getFunction(), ProgInfo.Occupancy});
+
const auto [MinWEU, MaxWEU] =
AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", {0, 0}, true);
uint64_t Occupancy;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index f66bbde42ce278..676a4687ee2af7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -24,6 +24,7 @@ struct AMDGPUResourceUsageAnalysis;
class AMDGPUTargetStreamer;
class MCCodeEmitter;
class MCOperand;
+class MCResourceInfo;
namespace AMDGPU {
struct MCKernelDescriptor;
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
AMDGPUResourceUsageAnalysis *ResourceUsage;
+ std::unique_ptr<MCResourceInfo> RI;
+
SIProgramInfo CurrentProgramInfo;
std::unique_ptr<AMDGPU::HSAMD::MetadataStreamer> HSAMetadataStream;
MCCodeEmitter *DumpCodeInstEmitter = nullptr;
+ // ValidateMCResourceInfo cannot recompute parts of the occupancy as it does
+ // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
+ // really want to track and validate the occupancy.
+ std::unique_ptr<DenseMap<const Function *, const MCExpr *>>
+ OccupancyValidateMap;
+
uint64_t getFunctionCodeSize(const MachineFunction &MF) const;
void getSIProgramInfo(SIProgramInfo &Out, const MachineFunction &MF);
@@ -60,11 +69,6 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
void EmitPALMetadata(const MachineFunction &MF,
const SIProgramInfo &KernelInfo);
void emitPALFunctionMetadata(const MachineFunction &MF);
- void emitCommonFunctionComments(uint32_t NumVGPR,
- std::optional<uint32_t> NumAGPR,
- uint32_t TotalNumVGPR, uint32_t NumSGPR,
- uint64_t ScratchSize, uint64_t CodeSize,
- const AMDGPUMachineFunction *MFI);
void emitCommonFunctionComments(const MCExpr *NumVGPR, const MCExpr *NumAGPR,
const MCExpr *TotalNumVGPR,
const MCExpr *NumSGPR,
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
SmallString<128> getMCExprStr(const MCExpr *Value);
+ /// Attempts to replace the validation that is missed in getSIProgramInfo due
+ /// to MCExpr being unknown. Invoked during doFinalization such that the
+ /// MCResourceInfo symbols are known.
+ void ValidateMCResourceInfo(Function &F);
+
public:
explicit AMDGPUAsmPrinter(TargetMachine &TM,
std::unique_ptr<MCStreamer> Streamer);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
new file mode 100644
index 00000000000000..58383475b312c9
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info ----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
+ switch (RIK) {
+ case RIK_NumVGPR:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));
+ case RIK_NumAGPR:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".num_agpr"));
+ case RIK_NumSGPR:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".num_sgpr"));
+ case RIK_PrivateSegSize:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".private_seg_size"));
+ case RIK_UsesVCC:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_vcc"));
+ case RIK_UsesFlatScratch:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_flat_scratch"));
+ case RIK_HasDynSizedStack:
+ return OutContext.getOrCreateSymbol(FuncName +
+ Twine(".has_dyn_sized_stack"));
+ case RIK_HasRecursion:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".has_recursion"));
+ case RIK_HasIndirectCall:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".has_indirect_call"));
+ }
+ llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+ ResourceInfoKind RIK,
+ MCContext &Ctx) {
+ return MCSymbolRefExpr::create(getSymbol(FuncName, RIK), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs() {
+ // Assign expression to get the max register use to the max_num_Xgpr symbol.
+ MCSymbol *MaxVGPRSym = getMaxVGPRSymbol();
+ MCSymbol *MaxAGPRSym = getMaxAGPRSymbol();
+ MCSymbol *MaxSGPRSym = getMaxSGPRSymbol();
+
+ auto assignMaxRegSym = [this](MCSymbol *Sym, int32_t RegCount) {
+ const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+ Sym->setVariableValue(MaxExpr);
+ };
+
+ assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+ assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+ assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::Finalize() {
+ assert(!finalized && "Cannot finalize ResourceInfo again.");
+ finalized = true;
+ assignMaxRegs();
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol() {
+ return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol() {
+ return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol() {
+ return OutContext.getOrCreateSymbol("max_num_sgpr");
+}
+
+void MCResourceInfo::assignResourceInfoExpr(
+ ...
[truncated]
|
@llvm/pr-subscribers-llvm-globalisel Author: Janek van Oirschot (JanekvO) Changes!!! Stacked PR on top of #95951 commit, please only review the latest commit 51f72f115b340a092c2c9f8569911b944a4efb6d !!!! Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.
Patch is 369.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102913.diff 66 Files Affected:
diff --git a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
index a05e21b37b9127..a2dd59a871904c 100644
--- a/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
+++ b/clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
@@ -2,12 +2,12 @@
// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx908 -Rpass-analysis=kernel-resource-usage -S -O0 -verify %s -o /dev/null
// expected-remark@+10 {{Function Name: foo}}
-// expected-remark@+9 {{ SGPRs: 13}}
-// expected-remark@+8 {{ VGPRs: 10}}
-// expected-remark@+7 {{ AGPRs: 12}}
-// expected-remark@+6 {{ ScratchSize [bytes/lane]: 0}}
+// expected-remark@+9 {{ SGPRs: foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1))}}
+// expected-remark@+8 {{ VGPRs: foo.num_vgpr}}
+// expected-remark@+7 {{ AGPRs: foo.num_agpr}}
+// expected-remark@+6 {{ ScratchSize [bytes/lane]: foo.private_seg_size}}
// expected-remark@+5 {{ Dynamic Stack: False}}
-// expected-remark@+4 {{ Occupancy [waves/SIMD]: 10}}
+// expected-remark@+4 {{ Occupancy [waves/SIMD]: occupancy(10, 4, 256, 8, 10, max(foo.num_sgpr+(extrasgprs(foo.uses_vcc, foo.uses_flat_scratch, 1)), 1, 0), max(totalnumvgprs(foo.num_agpr, foo.num_vgpr), 1, 0))}}
// expected-remark@+3 {{ SGPRs Spill: 0}}
// expected-remark@+2 {{ VGPRs Spill: 0}}
// expected-remark@+1 {{ LDS Size [bytes/block]: 0}}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index e64e28e01d3d18..97a5cb29d51023 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -18,6 +18,7 @@
#include "AMDGPUAsmPrinter.h"
#include "AMDGPU.h"
#include "AMDGPUHSAMetadataStreamer.h"
+#include "AMDGPUMCResourceInfo.h"
#include "AMDGPUResourceUsageAnalysis.h"
#include "GCNSubtarget.h"
#include "MCTargetDesc/AMDGPUInstPrinter.h"
@@ -92,6 +93,9 @@ AMDGPUAsmPrinter::AMDGPUAsmPrinter(TargetMachine &TM,
std::unique_ptr<MCStreamer> Streamer)
: AsmPrinter(TM, std::move(Streamer)) {
assert(OutStreamer && "AsmPrinter constructed without streamer");
+ RI = std::make_unique<MCResourceInfo>(OutContext);
+ OccupancyValidateMap =
+ std::make_unique<DenseMap<const Function *, const MCExpr *>>();
}
StringRef AMDGPUAsmPrinter::getPassName() const {
@@ -359,6 +363,102 @@ bool AMDGPUAsmPrinter::doInitialization(Module &M) {
return AsmPrinter::doInitialization(M);
}
+void AMDGPUAsmPrinter::ValidateMCResourceInfo(Function &F) {
+ if (F.isDeclaration() || !AMDGPU::isModuleEntryFunctionCC(F.getCallingConv()))
+ return;
+
+ using RIK = MCResourceInfo::ResourceInfoKind;
+ const GCNSubtarget &STM = TM.getSubtarget<GCNSubtarget>(F);
+
+ auto TryGetMCExprValue = [](const MCExpr *Value, uint64_t &Res) -> bool {
+ int64_t Val;
+ if (Value->evaluateAsAbsolute(Val)) {
+ Res = Val;
+ return true;
+ }
+ return false;
+ };
+
+ const uint64_t MaxScratchPerWorkitem =
+ STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
+ MCSymbol *ScratchSizeSymbol =
+ RI->getSymbol(F.getName(), RIK::RIK_PrivateSegSize);
+ uint64_t ScratchSize;
+ if (ScratchSizeSymbol->isVariable() &&
+ TryGetMCExprValue(ScratchSizeSymbol->getVariableValue(), ScratchSize) &&
+ ScratchSize > MaxScratchPerWorkitem) {
+ DiagnosticInfoStackSize DiagStackSize(F, ScratchSize, MaxScratchPerWorkitem,
+ DS_Error);
+ F.getContext().diagnose(DiagStackSize);
+ }
+
+ // Validate addressable scalar registers (i.e., prior to added implicit
+ // SGPRs).
+ MCSymbol *NumSGPRSymbol = RI->getSymbol(F.getName(), RIK::RIK_NumSGPR);
+ if (STM.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS &&
+ !STM.hasSGPRInitBug()) {
+ unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+ uint64_t NumSgpr;
+ if (NumSGPRSymbol->isVariable() &&
+ TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+ NumSgpr > MaxAddressableNumSGPRs) {
+ DiagnosticInfoResourceLimit Diag(F, "addressable scalar registers",
+ NumSgpr, MaxAddressableNumSGPRs,
+ DS_Error, DK_ResourceLimit);
+ F.getContext().diagnose(Diag);
+ return;
+ }
+ }
+
+ MCSymbol *VCCUsedSymbol = RI->getSymbol(F.getName(), RIK::RIK_UsesVCC);
+ MCSymbol *FlatUsedSymbol =
+ RI->getSymbol(F.getName(), RIK::RIK_UsesFlatScratch);
+ uint64_t VCCUsed, FlatUsed, NumSgpr;
+
+ if (NumSGPRSymbol->isVariable() && VCCUsedSymbol->isVariable() &&
+ FlatUsedSymbol->isVariable() &&
+ TryGetMCExprValue(NumSGPRSymbol->getVariableValue(), NumSgpr) &&
+ TryGetMCExprValue(VCCUsedSymbol->getVariableValue(), VCCUsed) &&
+ TryGetMCExprValue(FlatUsedSymbol->getVariableValue(), FlatUsed)) {
+
+ // Recomputes NumSgprs + implicit SGPRs but all symbols should now be
+ // resolvable.
+ NumSgpr += IsaInfo::getNumExtraSGPRs(
+ &STM, VCCUsed, FlatUsed,
+ getTargetStreamer()->getTargetID()->isXnackOnOrAny());
+ if (STM.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ||
+ STM.hasSGPRInitBug()) {
+ unsigned MaxAddressableNumSGPRs = STM.getAddressableNumSGPRs();
+ if (NumSgpr > MaxAddressableNumSGPRs) {
+ DiagnosticInfoResourceLimit Diag(F, "scalar registers", NumSgpr,
+ MaxAddressableNumSGPRs, DS_Error,
+ DK_ResourceLimit);
+ F.getContext().diagnose(Diag);
+ return;
+ }
+ }
+
+ auto I = OccupancyValidateMap->find(&F);
+ if (I != OccupancyValidateMap->end()) {
+ const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+ F, "amdgpu-waves-per-eu", {0, 0}, true);
+ uint64_t Occupancy;
+ const MCExpr *OccupancyExpr = I->getSecond();
+
+ if (TryGetMCExprValue(OccupancyExpr, Occupancy) && Occupancy < MinWEU) {
+ DiagnosticInfoOptimizationFailure Diag(
+ F, F.getSubprogram(),
+ "failed to meet occupancy target given by 'amdgpu-waves-per-eu' in "
+ "'" +
+ F.getName() + "': desired occupancy was " + Twine(MinWEU) +
+ ", final occupancy is " + Twine(Occupancy));
+ F.getContext().diagnose(Diag);
+ return;
+ }
+ }
+ }
+}
+
bool AMDGPUAsmPrinter::doFinalization(Module &M) {
// Pad with s_code_end to help tools and guard against instruction prefetch
// causing stale data in caches. Arguably this should be done by the linker,
@@ -371,39 +471,29 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
getTargetStreamer()->EmitCodeEnd(STI);
}
- return AsmPrinter::doFinalization(M);
-}
+ // Assign expressions which can only be resolved when all other functions are
+ // known.
+ RI->Finalize();
+ getTargetStreamer()->EmitMCResourceMaximums(
+ RI->getMaxVGPRSymbol(), RI->getMaxAGPRSymbol(), RI->getMaxSGPRSymbol());
-// Print comments that apply to both callable functions and entry points.
-void AMDGPUAsmPrinter::emitCommonFunctionComments(
- uint32_t NumVGPR, std::optional<uint32_t> NumAGPR, uint32_t TotalNumVGPR,
- uint32_t NumSGPR, uint64_t ScratchSize, uint64_t CodeSize,
- const AMDGPUMachineFunction *MFI) {
- OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);
- OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);
- OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);
- if (NumAGPR) {
- OutStreamer->emitRawComment(" NumAgprs: " + Twine(*NumAGPR), false);
- OutStreamer->emitRawComment(" TotalNumVgprs: " + Twine(TotalNumVGPR),
- false);
+ for (Function &F : M.functions()) {
+ ValidateMCResourceInfo(F);
}
- OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);
- OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),
- false);
+ return AsmPrinter::doFinalization(M);
}
SmallString<128> AMDGPUAsmPrinter::getMCExprStr(const MCExpr *Value) {
SmallString<128> Str;
raw_svector_ostream OSS(Str);
- int64_t IVal;
- if (Value->evaluateAsAbsolute(IVal)) {
- OSS << static_cast<uint64_t>(IVal);
- } else {
- Value->print(OSS, MAI);
- }
+ auto &Streamer = getTargetStreamer()->getStreamer();
+ auto &Context = Streamer.getContext();
+ const MCExpr *New = llvm::TryFold(Value, Context);
+ AMDGPUMCExprPrint(New, OSS, MAI);
return Str;
}
+// Print comments that apply to both callable functions and entry points.
void AMDGPUAsmPrinter::emitCommonFunctionComments(
const MCExpr *NumVGPR, const MCExpr *NumAGPR, const MCExpr *TotalNumVGPR,
const MCExpr *NumSGPR, const MCExpr *ScratchSize, uint64_t CodeSize,
@@ -573,21 +663,45 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
emitResourceUsageRemarks(MF, CurrentProgramInfo, MFI->isModuleEntryFunction(),
STM.hasMAIInsts());
+ {
+ const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+ ResourceUsage->getResourceInfo();
+ RI->gatherResourceInfo(MF, Info);
+ using RIK = MCResourceInfo::ResourceInfoKind;
+ getTargetStreamer()->EmitMCResourceInfo(
+ RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR),
+ RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR),
+ RI->getSymbol(MF.getName(), RIK::RIK_NumSGPR),
+ RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize),
+ RI->getSymbol(MF.getName(), RIK::RIK_UsesVCC),
+ RI->getSymbol(MF.getName(), RIK::RIK_UsesFlatScratch),
+ RI->getSymbol(MF.getName(), RIK::RIK_HasDynSizedStack),
+ RI->getSymbol(MF.getName(), RIK::RIK_HasRecursion),
+ RI->getSymbol(MF.getName(), RIK::RIK_HasIndirectCall));
+ }
+
if (isVerbose()) {
MCSectionELF *CommentSection =
Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);
OutStreamer->switchSection(CommentSection);
if (!MFI->isEntryFunction()) {
+ using RIK = MCResourceInfo::ResourceInfoKind;
OutStreamer->emitRawComment(" Function info:", false);
- const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
- ResourceUsage->getResourceInfo(&MF.getFunction());
+
emitCommonFunctionComments(
- Info.NumVGPR,
- STM.hasMAIInsts() ? Info.NumAGPR : std::optional<uint32_t>(),
- Info.getTotalNumVGPRs(STM),
- Info.getTotalNumSGPRs(MF.getSubtarget<GCNSubtarget>()),
- Info.PrivateSegmentSize, getFunctionCodeSize(MF), MFI);
+ RI->getSymbol(MF.getName(), RIK::RIK_NumVGPR)->getVariableValue(),
+ STM.hasMAIInsts() ? RI->getSymbol(MF.getName(), RIK::RIK_NumAGPR)
+ ->getVariableValue()
+ : nullptr,
+ RI->createTotalNumVGPRs(MF, Ctx),
+ RI->createTotalNumSGPRs(
+ MF,
+ MF.getSubtarget<GCNSubtarget>().getTargetID().isXnackOnOrAny(),
+ Ctx),
+ RI->getSymbol(MF.getName(), RIK::RIK_PrivateSegSize)
+ ->getVariableValue(),
+ getFunctionCodeSize(MF), MFI);
return false;
}
@@ -755,8 +869,6 @@ uint64_t AMDGPUAsmPrinter::getFunctionCodeSize(const MachineFunction &MF) const
void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
const MachineFunction &MF) {
- const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
- ResourceUsage->getResourceInfo(&MF.getFunction());
const GCNSubtarget &STM = MF.getSubtarget<GCNSubtarget>();
MCContext &Ctx = MF.getContext();
@@ -773,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
return false;
};
- ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR);
- ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR);
- ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM));
- ProgInfo.AccumOffset =
- CreateExpr(alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1);
+ auto GetSymRefExpr =
+ [&](MCResourceInfo::ResourceInfoKind RIK) -> const MCExpr * {
+ MCSymbol *Sym = RI->getSymbol(MF.getName(), RIK);
+ return MCSymbolRefExpr::create(Sym, Ctx);
+ };
+
+ const MCExpr *ConstFour = MCConstantExpr::create(4, Ctx);
+ const MCExpr *ConstOne = MCConstantExpr::create(1, Ctx);
+
+ using RIK = MCResourceInfo::ResourceInfoKind;
+ ProgInfo.NumArchVGPR = GetSymRefExpr(RIK::RIK_NumVGPR);
+ ProgInfo.NumAccVGPR = GetSymRefExpr(RIK::RIK_NumAGPR);
+ ProgInfo.NumVGPR = AMDGPUMCExpr::createTotalNumVGPR(
+ ProgInfo.NumAccVGPR, ProgInfo.NumArchVGPR, Ctx);
+
+ // AccumOffset computed for the MCExpr equivalent of:
+ // alignTo(std::max(1, Info.NumVGPR), 4) / 4 - 1;
+ ProgInfo.AccumOffset = MCBinaryExpr::createSub(
+ MCBinaryExpr::createDiv(
+ AMDGPUMCExpr::createAlignTo(
+ AMDGPUMCExpr::createMax({ConstOne, ProgInfo.NumArchVGPR}, Ctx),
+ ConstFour, Ctx),
+ ConstFour, Ctx),
+ ConstOne, Ctx);
ProgInfo.TgSplit = STM.isTgSplitEnabled();
- ProgInfo.NumSGPR = CreateExpr(Info.NumExplicitSGPR);
- ProgInfo.ScratchSize = CreateExpr(Info.PrivateSegmentSize);
- ProgInfo.VCCUsed = CreateExpr(Info.UsesVCC);
- ProgInfo.FlatUsed = CreateExpr(Info.UsesFlatScratch);
+ ProgInfo.NumSGPR = GetSymRefExpr(RIK::RIK_NumSGPR);
+ ProgInfo.ScratchSize = GetSymRefExpr(RIK::RIK_PrivateSegSize);
+ ProgInfo.VCCUsed = GetSymRefExpr(RIK::RIK_UsesVCC);
+ ProgInfo.FlatUsed = GetSymRefExpr(RIK::RIK_UsesFlatScratch);
ProgInfo.DynamicCallStack =
- CreateExpr(Info.HasDynamicallySizedStack || Info.HasRecursion);
+ MCBinaryExpr::createOr(GetSymRefExpr(RIK::RIK_HasDynSizedStack),
+ GetSymRefExpr(RIK::RIK_HasRecursion), Ctx);
const uint64_t MaxScratchPerWorkitem =
STM.getMaxWaveScratchSize() / STM.getWavefrontSize();
@@ -1084,6 +1216,8 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
STM.computeOccupancy(F, ProgInfo.LDSSize), ProgInfo.NumSGPRsForWavesPerEU,
ProgInfo.NumVGPRsForWavesPerEU, STM, Ctx);
+ OccupancyValidateMap->insert({&MF.getFunction(), ProgInfo.Occupancy});
+
const auto [MinWEU, MaxWEU] =
AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", {0, 0}, true);
uint64_t Occupancy;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index f66bbde42ce278..676a4687ee2af7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -24,6 +24,7 @@ struct AMDGPUResourceUsageAnalysis;
class AMDGPUTargetStreamer;
class MCCodeEmitter;
class MCOperand;
+class MCResourceInfo;
namespace AMDGPU {
struct MCKernelDescriptor;
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
AMDGPUResourceUsageAnalysis *ResourceUsage;
+ std::unique_ptr<MCResourceInfo> RI;
+
SIProgramInfo CurrentProgramInfo;
std::unique_ptr<AMDGPU::HSAMD::MetadataStreamer> HSAMetadataStream;
MCCodeEmitter *DumpCodeInstEmitter = nullptr;
+ // ValidateMCResourceInfo cannot recompute parts of the occupancy as it does
+ // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we
+ // really want to track and validate the occupancy.
+ std::unique_ptr<DenseMap<const Function *, const MCExpr *>>
+ OccupancyValidateMap;
+
uint64_t getFunctionCodeSize(const MachineFunction &MF) const;
void getSIProgramInfo(SIProgramInfo &Out, const MachineFunction &MF);
@@ -60,11 +69,6 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
void EmitPALMetadata(const MachineFunction &MF,
const SIProgramInfo &KernelInfo);
void emitPALFunctionMetadata(const MachineFunction &MF);
- void emitCommonFunctionComments(uint32_t NumVGPR,
- std::optional<uint32_t> NumAGPR,
- uint32_t TotalNumVGPR, uint32_t NumSGPR,
- uint64_t ScratchSize, uint64_t CodeSize,
- const AMDGPUMachineFunction *MFI);
void emitCommonFunctionComments(const MCExpr *NumVGPR, const MCExpr *NumAGPR,
const MCExpr *TotalNumVGPR,
const MCExpr *NumSGPR,
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
SmallString<128> getMCExprStr(const MCExpr *Value);
+ /// Attempts to replace the validation that is missed in getSIProgramInfo due
+ /// to MCExpr being unknown. Invoked during doFinalization such that the
+ /// MCResourceInfo symbols are known.
+ void ValidateMCResourceInfo(Function &F);
+
public:
explicit AMDGPUAsmPrinter(TargetMachine &TM,
std::unique_ptr<MCStreamer> Streamer);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
new file mode 100644
index 00000000000000..58383475b312c9
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
@@ -0,0 +1,220 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info ----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) {
+ switch (RIK) {
+ case RIK_NumVGPR:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr"));
+ case RIK_NumAGPR:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".num_agpr"));
+ case RIK_NumSGPR:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".num_sgpr"));
+ case RIK_PrivateSegSize:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".private_seg_size"));
+ case RIK_UsesVCC:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_vcc"));
+ case RIK_UsesFlatScratch:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".uses_flat_scratch"));
+ case RIK_HasDynSizedStack:
+ return OutContext.getOrCreateSymbol(FuncName +
+ Twine(".has_dyn_sized_stack"));
+ case RIK_HasRecursion:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".has_recursion"));
+ case RIK_HasIndirectCall:
+ return OutContext.getOrCreateSymbol(FuncName + Twine(".has_indirect_call"));
+ }
+ llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+ ResourceInfoKind RIK,
+ MCContext &Ctx) {
+ return MCSymbolRefExpr::create(getSymbol(FuncName, RIK), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs() {
+ // Assign expression to get the max register use to the max_num_Xgpr symbol.
+ MCSymbol *MaxVGPRSym = getMaxVGPRSymbol();
+ MCSymbol *MaxAGPRSym = getMaxAGPRSymbol();
+ MCSymbol *MaxSGPRSym = getMaxSGPRSymbol();
+
+ auto assignMaxRegSym = [this](MCSymbol *Sym, int32_t RegCount) {
+ const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+ Sym->setVariableValue(MaxExpr);
+ };
+
+ assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+ assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+ assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::Finalize() {
+ assert(!finalized && "Cannot finalize ResourceInfo again.");
+ finalized = true;
+ assignMaxRegs();
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol() {
+ return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol() {
+ return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol() {
+ return OutContext.getOrCreateSymbol("max_num_sgpr");
+}
+
+void MCResourceInfo::assignResourceInfoExpr(
+ ...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Upstream's formatting does not seem to agree with my local |
Is it best to leave comments on the commit from the PR comment, i.e. currently 51f72f1 ? I can't seem to start a review, so it will be individual comments (I think). Is there another way to handle this? |
422a81b
to
5dff9e2
Compare
Apologies, I think some of the additional stacked PR tooling might've been better to use rather than this manual stacked PR but I just haven't looked into any of said tooling yet (Graphite?). The PR that this one depended on has been pushed and I've rebased so it should be good to go. Thanks! |
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter { | |||
|
|||
AMDGPUResourceUsageAnalysis *ResourceUsage; | |||
|
|||
std::unique_ptr<MCResourceInfo> RI; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need unique_ptr instead of just a plain member?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It requires a class inherited member for its constructor (in this case, OutContext
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But OutContext is a reference in the parent already, so you can use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only was I was able to do so was by explicitly adding the MCContext
as argument for every public method in AMDGPUResourceInfo
. Let me know if there's a way I missed without being so verbose with MCContext
arguments in AMDGPUResourceInfo
.
@@ -3025,8 +3025,8 @@ define amdgpu_kernel void @dyn_extract_v5f64_s_s(ptr addrspace(1) %out, i32 %sel | |||
; GPRIDX-NEXT: amd_machine_version_stepping = 0 | |||
; GPRIDX-NEXT: kernel_code_entry_byte_offset = 256 | |||
; GPRIDX-NEXT: kernel_code_prefetch_byte_size = 0 | |||
; GPRIDX-NEXT: granulated_workitem_vgpr_count = 0 | |||
; GPRIDX-NEXT: granulated_wavefront_sgpr_count = 1 | |||
; GPRIDX-NEXT: granulated_workitem_vgpr_count = (11468800|(((((alignto(max(max(totalnumvgprs(dyn_extract_v5f64_s_s.num_agpr, dyn_extract_v5f64_s_s.num_vgpr), 1, 0), 1), 4))/4)-1)&63)|(((((alignto(max(max(dyn_extract_v5f64_s_s.num_sgpr+(extrasgprs(dyn_extract_v5f64_s_s.uses_vcc, dyn_extract_v5f64_s_s.uses_flat_scratch, 1)), 1, 0), 1), 8))/8)-1)&15)<<6)))&63 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turned into a mess for what should be a simple function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any of the the sub-expressions of a MCExpr is unknown/unresolvable at this point in time (i.e., asm printing for a particular MachineFunction) it will print out the MCExpr in its most verbose way possible. It doesn't help that both the granulated_workitem_vgpr_count
and granulated_wavefront_sgpr_count
are basically the same MCExpr, but masked for the only the relevant bits (i.e., compute_pgm_resource1_registers
masked for whatever we want to retrieve).
I was thinking of explicitly splitting all of the components that compose any of the compute_pgm_resourceX
registers into their own MCExpr and leave computation of the compute_pgm_resourceX
register for when they're used/necessary. However, this wouldn't help resolving the unknowns/unresolvables at the time of printing the amd_kernel_code_t
metadata. Do let me know if splitting the composed registers is still desired, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why wasn't this resolvable? This function has no calls or anything complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I had to move the AMDGPUResourceInfo
gathering function prior to amdhsa kernel descriptor asm printer to make deduce the values.
; DEFAULTSIZE: ; ScratchSize: 16 | ||
|
||
; ASSUME1024: .amdhsa_private_segment_fixed_size 1040 | ||
; ASSUME1024: .amdhsa_private_segment_fixed_size kernel_non_entry_block_static_alloca_uniformly_reached_align4.private_seg_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this print as a simple size anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a function's private_segment_size
depends on itself, and all of its callees' propagated private_segment_size
it may be the case that any of the callees' private_segment_size
has yet to be analysed through AMDGPUResourceUsageAnalysis
.
For example:
void foo() {
...
call bar
...
}
void bar() {
...
...
}
Where for foo
, we know and can construct the calculation for its private_segment_size
as
[foo's own private segment required size] + max(bar.private_segment_size, [any other of foo's called function's private_segment_size])
But because we are currently printing foo
's MachineFunction in AMDGPUAsmPrinter
, we haven't analysed bar
's private_segment_size
yet. This will eventually be known but as we need to print metadata for foo
, it's already printed with bar
's private_segment_size
placeholder.
TL;DR: cannot compute constant value yet, need to print symbolic representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was more this specific test case does not have calls and should be trivially resolved
I applied this locally and it resolved #64863 so I'm looking forward to this landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with some nits
The expressions look pretty reasonable to me, although I didn't really go through validating that they are correct
MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK) { | ||
switch (RIK) { | ||
case RIK_NumVGPR: | ||
return OutContext.getOrCreateSymbol(FuncName + Twine(".num_vgpr")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these names guaranteed to be unique? Do we reserve any symbol with a '.' in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these names guaranteed to be unique?
From what I could tell, function names have to be unique at this point (e.g., name mangling already happened)
Do we reserve any symbol with a '.' in it?
I'll have a look at this, I didn't see any reserved symbols while I was working on this but I haven't explicitly searched for it either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow up on the '.' symbols: there's a few cases of symbols including dots like local symbols getting prefixed with .L
or some possibly emitted symbols like .note.GNU-stack
but the dots don't seem to be exclusive. I can, however, use another delimiter (or remove delimiter altogether) if the dot might be something we'd like to keep for other cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names seem fine to me, but we should write down which symbols are "reserved" or "well known" or however we want to define it relative to how they are intended to be used. Can you document them in AMDGPUUsage maybe?
My original question was more around how a user could technically contrive a conflict with an __asm__(("foo.num_vgpr"))
definition, which is outlandish and easy enough for us to say "don't do that". I just want to make sure we write down the spirit of "don't do that", in a similar way to how C/C++ (I believe?) define symbols starting with "__" as reserved for implementation use.
MCContext &OutContext) { | ||
const MCConstantExpr *LocalConstExpr = | ||
MCConstantExpr::create(LocalValue, OutContext); | ||
const MCExpr *SymVal = LocalConstExpr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we don't take advantage of indirect calls with a known set of callees
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in, finding the module level maximum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we can know when an indirect call has a known possible range of callees. Whether from !callees metadata or from seeing the possible values of the call target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to look into this a bit as I wasn't aware of the !callees metadata: but yes, we currently don't take advantage of this. I was able to create an example that emits the !callees metadata for an indirect call but it will always fall back on the worst case (i.e., module level worst case values).
@@ -3,6 +3,18 @@ | |||
|
|||
declare i32 @llvm.amdgcn.workitem.id.x() | |||
|
|||
define <2 x i64> @f1() #0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated function appeared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some function order changes in some tests. This function is the same one as the function that is removed a couple of lines below.
@@ -0,0 +1,533 @@ | |||
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s | |||
|
|||
; SGPR use may not seem equal to the sgpr use provided in comments as the latter includes extra sgprs (e.g., for vcc use). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either fix the naming of these fields to say it's only the numbered SGPRs, or just always include the + vcc part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied the following renames:
the function level number of sgpr symbol went from <function name>.num_sgpr
to <function name>.numbered_sgpr
remark went from SGPRs
to TotalSGPRs
comment went from NumSgprs
to TotalNumSgprs
} | ||
|
||
; GCN-LABEL: {{^}}indirect_use_vcc: | ||
; GCN: .set indirect_use_vcc.num_vgpr, max(41, use_vcc.num_vgpr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you reassemble the output, do the resources always resolve to constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to go from function-resource-usage.ll
to funcion-resource-usage.s
and try to assemble function-resource-usage.s
through llvm-mc
, it would resolve whatever it can at the time of emitting again. In the specific case you've highlighted it would resolve into a constant as use_vcc.num_vgpr
is defined before its use.
However, cases which use/depend on the module level maximum (e.g., max_num_vgpr
) wouldn't be able to resolve to a constant as these module level maximums are defined at the bottom of the file.
TL;DR: order of symbol define/use dependent
MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) { | ||
return OutContext.getOrCreateSymbol("max_num_vgpr"); | ||
} | ||
|
||
MCSymbol *MCResourceInfo::getMaxAGPRSymbol(MCContext &OutContext) { | ||
return OutContext.getOrCreateSymbol("max_num_agpr"); | ||
} | ||
|
||
MCSymbol *MCResourceInfo::getMaxSGPRSymbol(MCContext &OutContext) { | ||
return OutContext.getOrCreateSymbol("max_num_sgpr"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't convinced myself of the naming/approach I've used here. These are module scope maxima but every module is going to re-defined these for their own scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if these should be placed in a custom section. In any case, we will eventually need custom linker logic to deal with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put these in their own section for now, albeit without any module-unique identifiers (.AMDGPU.gpr_maximums
).
Ping |
// validateMCResourceInfo cannot recompute parts of the occupancy as it does | ||
// for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we | ||
// really want to track and validate the occupancy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this is the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought I needed access to the MachineFunction
for SIMachineFunctionInfo
but I can reconstruct it using just the Function
and GCNSubTarget
. It now recomputes the occupancy instead of this caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do need the MachineFunction to get its SIMachineFunctionInfo. Constructing a new one won't give the same result?
MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) { | ||
return OutContext.getOrCreateSymbol("max_num_vgpr"); | ||
} | ||
|
||
MCSymbol *MCResourceInfo::getMaxAGPRSymbol(MCContext &OutContext) { | ||
return OutContext.getOrCreateSymbol("max_num_agpr"); | ||
} | ||
|
||
MCSymbol *MCResourceInfo::getMaxSGPRSymbol(MCContext &OutContext) { | ||
return OutContext.getOrCreateSymbol("max_num_sgpr"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if these should be placed in a custom section. In any case, we will eventually need custom linker logic to deal with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need more thought about how the ABI for this will work, but we need to start somewhere
65b98a8
to
9aa2721
Compare
Rebase |
…on pass and move metadata propagation logic to MC layer
… amdhsa kernel descriptor emit, remove duplicate validation for stack size
…re verbose in what's emitted
9aa2721
to
8cfbc1b
Compare
Rebase, hopefully pull in fixes for unrelated test failures |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/151/builds/2543 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/2796 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/3916 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/4341 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/145/builds/2167 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/1275 Here is the relevant piece of the build log for the reference
|
…ass (llvm#102913) Reland: Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for. Fixes llvm#64863 Change-Id: I180c941a1535be646144960ff62e4cf24a5aa1da
…ass (llvm#102913) Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for. Fixes llvm#64863 [AMDGPU] Fix stack size metadata for functions with direct and indirect calls (llvm#110828) When a function has an external call, it should still use the stack sizes of direct, known, calls to calculate its own stack size [AMDGPU] Fix resource usage information for unnamed functions (llvm#115320) Resource usage information would try to overwrite unnamed functions if there are multiple within the same compilation unit. This aims to either use the `MCSymbol` assigned to the unnamed function (i.e., `CurrentFnSym`), or, rematerialize the `MCSymbol` for the unnamed function. Reapply [AMDGPU] Avoid resource propagation for recursion through multiple functions (llvm#112251) I was wrong last patch. I viewed the `Visited` set purely as a possible recursion deterrent where functions calling a callee multiple times are handled elsewhere. This doesn't consider cases where a function is called multiple times by different callers still part of the same call graph. New test shows the aforementioned case. Reapplies llvm#111004, fixes llvm#115562. [AMDGPU] Newly added test modified for recent SGPR use change (llvm#116427) Mistimed rebase for llvm#112251 which added new tests which did not consider the changes introduced in llvm#112403 yet Change-Id: I4dfe6a1b679137e080a6d2b44016347ea704b014
Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for.
ValidateMCResourceInfo
. Previously (and currently, still) done ingetSIProgramInfo
. However, MCExprs may not be resolvable duringgetSIProgramInfo
due to propagation of yet-to-be-defined function resource info symbols. Does require some caching of the generated occupancy MCExpr computation ingetSIProgramInfo
to validate against.max_num_Xgprs
symbols to be emitted. Cannot generate a separate MCExpr for finding the max without getting recursive symbol defines.Fixes #64863