Skip to content

[MemProf] Optionally save context size info on largest cold allocations #142507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 3, 2025

Conversation

teresajohnson
Copy link
Contributor

In order to allow selective reporting of context hinting during the LTO
link, and in the future to allow selective more aggressive cloning, add
an option to specify a minimum percent of the max cold size in the
profile summary. Contexts that meet that threshold will get context size
info metadata (and ThinLTO summary information) on the associated
allocations.

Specifying -memprof-report-hinted-sizes during the pre-LTO compile step
will continue to cause all contexts to receive this metadata. But
specifying -memprof-report-hinted-sizes only during the LTO link will
cause only those that meet the new threshold and have the metadata to
get reported.

To support this, because the alloc info summary and associated bitcode
requires the context size information to be in the same order as the
other context information, 0s are inserted for contexts without this
metadata. The bitcode writer uses a more compact format for the context
ids to allow better compression of the 0s.

As part of this change several helper methods are added to query whether
metadata contains context size info on any or all contexts.

In order to allow selective reporting of context hinting during the LTO
link, and in the future to allow selective more aggressive cloning, add
an option to specify a minimum percent of the max cold size in the
profile summary. Contexts that meet that threshold will get context size
info metadata (and ThinLTO summary information) on the associated
allocations.

Specifying -memprof-report-hinted-sizes during the pre-LTO compile step
will continue to cause all contexts to receive this metadata. But
specifying -memprof-report-hinted-sizes only during the LTO link will
cause only those that meet the new threshold and have the metadata to
get reported.

To support this, because the alloc info summary and associated bitcode
requires the context size information to be in the same order as the
other context information, 0s are inserted for contexts without this
metadata. The bitcode writer uses a more compact format for the context
ids to allow better compression of the 0s.

As part of this change several helper methods are added to query whether
metadata contains context size info on any or all contexts.
@llvmbot llvmbot added PGO Profile Guided Optimizations LTO Link time optimization (regular/full LTO or ThinLTO) llvm:analysis llvm:transforms labels Jun 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 3, 2025

@llvm/pr-subscribers-lto
@llvm/pr-subscribers-pgo

@llvm/pr-subscribers-llvm-analysis

Author: Teresa Johnson (teresajohnson)

Changes

In order to allow selective reporting of context hinting during the LTO
link, and in the future to allow selective more aggressive cloning, add
an option to specify a minimum percent of the max cold size in the
profile summary. Contexts that meet that threshold will get context size
info metadata (and ThinLTO summary information) on the associated
allocations.

Specifying -memprof-report-hinted-sizes during the pre-LTO compile step
will continue to cause all contexts to receive this metadata. But
specifying -memprof-report-hinted-sizes only during the LTO link will
cause only those that meet the new threshold and have the metadata to
get reported.

To support this, because the alloc info summary and associated bitcode
requires the context size information to be in the same order as the
other context information, 0s are inserted for contexts without this
metadata. The bitcode writer uses a more compact format for the context
ids to allow better compression of the 0s.

As part of this change several helper methods are added to query whether
metadata contains context size info on any or all contexts.


Patch is 25.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142507.diff

9 Files Affected:

  • (modified) llvm/include/llvm/Analysis/MemoryProfileInfo.h (+18-1)
  • (modified) llvm/lib/Analysis/MemoryProfileInfo.cpp (+39-9)
  • (modified) llvm/lib/Analysis/ModuleSummaryAnalysis.cpp (+16-1)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+8)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+18-8)
  • (modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+2-3)
  • (modified) llvm/lib/Transforms/Instrumentation/MemProfiler.cpp (+9-16)
  • (added) llvm/test/ThinLTO/X86/memprof-report-hinted-partial.ll (+73)
  • (added) llvm/test/Transforms/PGOProfile/memprof_max_cold_threshold.test (+131)
diff --git a/llvm/include/llvm/Analysis/MemoryProfileInfo.h b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
index 8cbb8673b69f5..b042a717e4e49 100644
--- a/llvm/include/llvm/Analysis/MemoryProfileInfo.h
+++ b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
@@ -24,6 +24,18 @@ class OptimizationRemarkEmitter;
 
 namespace memprof {
 
+/// Whether the alloc memeprof metadata will include context size info for all
+/// MIBs.
+LLVM_ABI bool metadataIncludesAllContextSizeInfo();
+
+/// Whether the alloc memprof metadata may include context size info for some
+/// MIBs (but possibly not all).
+LLVM_ABI bool metadataMayIncludeContextSizeInfo();
+
+/// Whether we need to record the context size info in the alloc trie used to
+/// build metadata.
+LLVM_ABI bool recordContextSizeInfoForAnalysis();
+
 /// Build callstack metadata from the provided list of call stack ids. Returns
 /// the resulting metadata node.
 LLVM_ABI MDNode *buildCallstackMetadata(ArrayRef<uint64_t> CallStack,
@@ -87,6 +99,9 @@ class CallStackTrie {
   // allocations for which we apply non-context sensitive allocation hints.
   OptimizationRemarkEmitter *ORE;
 
+  // The maximum size of a cold allocation context, from the profile summary.
+  uint64_t MaxColdSize;
+
   void deleteTrieNode(CallStackTrieNode *Node) {
     if (!Node)
       return;
@@ -113,7 +128,9 @@ class CallStackTrie {
                      uint64_t &ColdBytes);
 
 public:
-  CallStackTrie(OptimizationRemarkEmitter *ORE = nullptr) : ORE(ORE) {}
+  CallStackTrie(OptimizationRemarkEmitter *ORE = nullptr,
+                uint64_t MaxColdSize = 0)
+      : ORE(ORE), MaxColdSize(MaxColdSize) {}
   ~CallStackTrie() { deleteTrieNode(Alloc); }
 
   bool empty() const { return Alloc == nullptr; }
diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 347377522101a..2ea765be5dbe6 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -46,6 +46,25 @@ cl::opt<unsigned> MinCallsiteColdBytePercent(
     cl::desc("Min percent of cold bytes at a callsite to discard non-cold "
              "contexts"));
 
+// Enable saving context size information for largest cold contexts, which can
+// be used to flag contexts for more aggressive cloning and reporting.
+cl::opt<unsigned> MinPercentMaxColdSize(
+    "memprof-min-percent-max-cold-size", cl::init(100), cl::Hidden,
+    cl::desc("Min percent of max cold bytes for critical cold context"));
+
+bool llvm::memprof::metadataIncludesAllContextSizeInfo() {
+  return MemProfReportHintedSizes || MinClonedColdBytePercent < 100;
+}
+
+bool llvm::memprof::metadataMayIncludeContextSizeInfo() {
+  return metadataIncludesAllContextSizeInfo() || MinPercentMaxColdSize < 100;
+}
+
+bool llvm::memprof::recordContextSizeInfoForAnalysis() {
+  return metadataMayIncludeContextSizeInfo() ||
+         MinCallsiteColdBytePercent < 100;
+}
+
 MDNode *llvm::memprof::buildCallstackMetadata(ArrayRef<uint64_t> CallStack,
                                               LLVMContext &Ctx) {
   SmallVector<Metadata *, 8> StackVals;
@@ -168,7 +187,8 @@ void CallStackTrie::addCallStack(MDNode *MIB) {
 static MDNode *createMIBNode(LLVMContext &Ctx, ArrayRef<uint64_t> MIBCallStack,
                              AllocationType AllocType,
                              ArrayRef<ContextTotalSize> ContextSizeInfo,
-                             uint64_t &TotalBytes, uint64_t &ColdBytes) {
+                             uint64_t &TotalBytes, uint64_t &ColdBytes,
+                             uint64_t MaxColdSize) {
   SmallVector<Metadata *> MIBPayload(
       {buildCallstackMetadata(MIBCallStack, Ctx)});
   MIBPayload.push_back(
@@ -184,12 +204,21 @@ static MDNode *createMIBNode(LLVMContext &Ctx, ArrayRef<uint64_t> MIBCallStack,
 
   for (const auto &[FullStackId, TotalSize] : ContextSizeInfo) {
     TotalBytes += TotalSize;
-    if (AllocType == AllocationType::Cold)
+    bool LargeColdContext = false;
+    if (AllocType == AllocationType::Cold) {
       ColdBytes += TotalSize;
+      // If we have the max cold context size from summary information and have
+      // requested identification of contexts above a percentage of the max, see
+      // if this context qualifies.
+      if (MaxColdSize > 0 && MinPercentMaxColdSize < 100 &&
+          TotalSize * 100 >= MaxColdSize * MinPercentMaxColdSize)
+        LargeColdContext = true;
+    }
     // Only add the context size info as metadata if we need it in the thin
-    // link (currently if reporting of hinted sizes is enabled or we have
-    // specified a threshold for marking allocations cold after cloning).
-    if (MemProfReportHintedSizes || MinClonedColdBytePercent < 100) {
+    // link (currently if reporting of hinted sizes is enabled, we have
+    // specified a threshold for marking allocations cold after cloning, or we
+    // have identified this as a large cold context of interest above).
+    if (metadataIncludesAllContextSizeInfo() || LargeColdContext) {
       auto *FullStackIdMD = ValueAsMetadata::get(
           ConstantInt::get(Type::getInt64Ty(Ctx), FullStackId));
       auto *TotalSizeMD = ValueAsMetadata::get(
@@ -357,9 +386,9 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
   if (hasSingleAllocType(Node->AllocTypes)) {
     std::vector<ContextTotalSize> ContextSizeInfo;
     collectContextSizeInfo(Node, ContextSizeInfo);
-    MIBNodes.push_back(createMIBNode(Ctx, MIBCallStack,
-                                     (AllocationType)Node->AllocTypes,
-                                     ContextSizeInfo, TotalBytes, ColdBytes));
+    MIBNodes.push_back(
+        createMIBNode(Ctx, MIBCallStack, (AllocationType)Node->AllocTypes,
+                      ContextSizeInfo, TotalBytes, ColdBytes, MaxColdSize));
     return true;
   }
 
@@ -413,7 +442,8 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
   std::vector<ContextTotalSize> ContextSizeInfo;
   collectContextSizeInfo(Node, ContextSizeInfo);
   MIBNodes.push_back(createMIBNode(Ctx, MIBCallStack, AllocationType::NotCold,
-                                   ContextSizeInfo, TotalBytes, ColdBytes));
+                                   ContextSizeInfo, TotalBytes, ColdBytes,
+                                   MaxColdSize));
   return true;
 }
 
diff --git a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
index 59fa1a4b03c37..332122700c886 100644
--- a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
+++ b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
@@ -544,7 +544,8 @@ static void computeFunctionSummary(
           }
           // If we have context size information, collect it for inclusion in
           // the summary.
-          assert(MIBMD->getNumOperands() > 2 || !MemProfReportHintedSizes);
+          assert(MIBMD->getNumOperands() > 2 ||
+                 !metadataIncludesAllContextSizeInfo());
           if (MIBMD->getNumOperands() > 2) {
             std::vector<ContextTotalSize> ContextSizes;
             for (unsigned I = 2; I < MIBMD->getNumOperands(); I++) {
@@ -558,7 +559,21 @@ static void computeFunctionSummary(
                                 ->getZExtValue();
               ContextSizes.push_back({FullStackId, TS});
             }
+            // The ContextSizeInfos must be in the same relative position as the
+            // associated MIB. In some cases we only include a ContextSizeInfo
+            // for a subset of MIBs in an allocation. In those cases we insert
+            // 0s for the other MIBs. Handle the case where the first
+            // ContextSizeInfo being inserted is not for the first MIB, insert
+            // a pair of 0s for each of the prior MIBs.
+            if (ContextSizeInfos.empty() && !MIBs.empty())
+              ContextSizeInfos.insert(ContextSizeInfos.begin(), MIBs.size(),
+                                      {{0, 0}});
             ContextSizeInfos.push_back(std::move(ContextSizes));
+          } else if (!ContextSizeInfos.empty()) {
+            // See earlier comment about handling case of ContextSizeInfos only
+            // for a subset of MIBs. Insert a pair of 0s for this MIB as it does
+            // not have a ContextSizeInfo but other MIBs did.
+            ContextSizeInfos.push_back({{0, 0}});
           }
           MIBs.push_back(
               MIBInfo(getMIBAllocType(MIBMD), std::move(StackIdIndices)));
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index ce7b1ef65eed7..04840b50fa272 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -8164,6 +8164,14 @@ Error ModuleSummaryIndexBitcodeReader::parseEntireSummary(unsigned ID) {
           ContextSizes.reserve(NumContextSizeInfoEntries);
           for (unsigned J = 0; J < NumContextSizeInfoEntries; J++) {
             assert(ContextIdIndex < PendingContextIds.size());
+            // Skip any 0 entries for MIBs without the context size info.
+            if (PendingContextIds[ContextIdIndex] == 0) {
+              // The size should also be 0 if the context was 0.
+              assert(!Record[I]);
+              ContextIdIndex++;
+              I++;
+              continue;
+            }
             // PendingContextIds read from the preceding FS_ALLOC_CONTEXT_IDS
             // should be in the same order as the total sizes.
             ContextSizes.push_back(
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index f8748babb1625..ec4dd3d148d99 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -23,6 +23,7 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringMap.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/MemoryProfileInfo.h"
 #include "llvm/BinaryFormat/Dwarf.h"
 #include "llvm/Bitcode/BitcodeCommon.h"
 #include "llvm/Bitcode/BitcodeReader.h"
@@ -4584,14 +4585,23 @@ void ModuleBitcodeWriterBase::writePerModuleGlobalValueSummary() {
     Stream.EmitRecord(bitc::FS_STACK_IDS, Vals, StackIdAbbvId);
   }
 
-  // n x context id
-  auto ContextIdAbbv = std::make_shared<BitCodeAbbrev>();
-  ContextIdAbbv->Add(BitCodeAbbrevOp(bitc::FS_ALLOC_CONTEXT_IDS));
-  ContextIdAbbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
-  // The context ids are hashes that are close to 64 bits in size, so emitting
-  // as a pair of 32-bit fixed-width values is more efficient than a VBR.
-  ContextIdAbbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32));
-  unsigned ContextIdAbbvId = Stream.EmitAbbrev(std::move(ContextIdAbbv));
+  unsigned ContextIdAbbvId = 0;
+  if (metadataMayIncludeContextSizeInfo()) {
+    // n x context id
+    auto ContextIdAbbv = std::make_shared<BitCodeAbbrev>();
+    ContextIdAbbv->Add(BitCodeAbbrevOp(bitc::FS_ALLOC_CONTEXT_IDS));
+    ContextIdAbbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
+    // The context ids are hashes that are close to 64 bits in size, so emitting
+    // as a pair of 32-bit fixed-width values is more efficient than a VBR if we
+    // are emitting them for all MIBs. Otherwise we use VBR to better compress 0
+    // values that are expected to more frequently occur in an alloc's memprof
+    // summary.
+    if (metadataIncludesAllContextSizeInfo())
+      ContextIdAbbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32));
+    else
+      ContextIdAbbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
+    ContextIdAbbvId = Stream.EmitAbbrev(std::move(ContextIdAbbv));
+  }
 
   // Abbrev for FS_PERMODULE_PROFILE.
   Abbv = std::make_shared<BitCodeAbbrev>();
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index 5b4350845b726..cff38a8e68c6a 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -2232,9 +2232,8 @@ IndexCallsiteContextGraph::IndexCallsiteContextGraph(
           CallStack<MIBInfo, SmallVector<unsigned>::const_iterator>
               EmptyContext;
           unsigned I = 0;
-          assert(
-              (!MemProfReportHintedSizes && MinClonedColdBytePercent >= 100) ||
-              AN.ContextSizeInfos.size() == AN.MIBs.size());
+          assert(!metadataMayIncludeContextSizeInfo() ||
+                 AN.ContextSizeInfos.size() == AN.MIBs.size());
           // Now add all of the MIBs and their stack nodes.
           for (auto &MIB : AN.MIBs) {
             CallStack<MIBInfo, SmallVector<unsigned>::const_iterator>
diff --git a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
index c03aa5accc011..76f935b377525 100644
--- a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
@@ -178,10 +178,6 @@ static cl::opt<bool>
                         cl::desc("Salvage stale MemProf profile"),
                         cl::init(false), cl::Hidden);
 
-extern cl::opt<bool> MemProfReportHintedSizes;
-extern cl::opt<unsigned> MinClonedColdBytePercent;
-extern cl::opt<unsigned> MinCallsiteColdBytePercent;
-
 static cl::opt<unsigned> MinMatchedColdBytePercent(
     "memprof-matching-cold-threshold", cl::init(100), cl::Hidden,
     cl::desc("Min percent of cold bytes matched to hint allocation cold"));
@@ -293,13 +289,6 @@ class ModuleMemProfiler {
   Function *MemProfCtorFunction = nullptr;
 };
 
-// Options under which we need to record the context size info in the alloc trie
-// used to build metadata.
-bool recordContextSizeInfo() {
-  return MemProfReportHintedSizes || MinClonedColdBytePercent < 100 ||
-         MinCallsiteColdBytePercent < 100;
-}
-
 } // end anonymous namespace
 
 MemProfilerPass::MemProfilerPass() = default;
@@ -752,7 +741,7 @@ static AllocationType addCallStack(CallStackTrie &AllocTrie,
                                 AllocInfo->Info.getAllocCount(),
                                 AllocInfo->Info.getTotalLifetime());
   std::vector<ContextTotalSize> ContextSizeInfo;
-  if (recordContextSizeInfo()) {
+  if (recordContextSizeInfoForAnalysis()) {
     auto TotalSize = AllocInfo->Info.getTotalSize();
     assert(TotalSize);
     assert(FullStackId != 0);
@@ -958,7 +947,7 @@ readMemprof(Module &M, Function &F, IndexedInstrProfReader *MemProfReader,
             std::map<uint64_t, AllocMatchInfo> &FullStackIdToAllocMatchInfo,
             std::set<std::vector<uint64_t>> &MatchedCallSites,
             DenseMap<uint64_t, LocToLocMap> &UndriftMaps,
-            OptimizationRemarkEmitter &ORE) {
+            OptimizationRemarkEmitter &ORE, uint64_t MaxColdSize) {
   auto &Ctx = M.getContext();
   // Previously we used getIRPGOFuncName() here. If F is local linkage,
   // getIRPGOFuncName() returns FuncName with prefix 'FileName;'. But
@@ -1125,7 +1114,7 @@ readMemprof(Module &M, Function &F, IndexedInstrProfReader *MemProfReader,
         // We may match this instruction's location list to multiple MIB
         // contexts. Add them to a Trie specialized for trimming the contexts to
         // the minimal needed to disambiguate contexts with unique behavior.
-        CallStackTrie AllocTrie(&ORE);
+        CallStackTrie AllocTrie(&ORE, MaxColdSize);
         uint64_t TotalSize = 0;
         uint64_t TotalColdSize = 0;
         for (auto *AllocInfo : AllocInfoIter->second) {
@@ -1136,7 +1125,7 @@ readMemprof(Module &M, Function &F, IndexedInstrProfReader *MemProfReader,
                                                  InlinedCallStack)) {
             NumOfMemProfMatchedAllocContexts++;
             uint64_t FullStackId = 0;
-            if (ClPrintMemProfMatchInfo || recordContextSizeInfo())
+            if (ClPrintMemProfMatchInfo || recordContextSizeInfoForAnalysis())
               FullStackId = computeFullStackId(AllocInfo->CallStack);
             auto AllocType = addCallStack(AllocTrie, AllocInfo, FullStackId);
             TotalSize += AllocInfo->Info.getTotalSize();
@@ -1267,6 +1256,10 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
   // call stack.
   std::set<std::vector<uint64_t>> MatchedCallSites;
 
+  uint64_t MaxColdSize = 0;
+  if (auto *MemProfSum = MemProfReader->getMemProfSummary())
+    MaxColdSize = MemProfSum->getMaxColdTotalSize();
+
   for (auto &F : M) {
     if (F.isDeclaration())
       continue;
@@ -1274,7 +1267,7 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
     const TargetLibraryInfo &TLI = FAM.getResult<TargetLibraryAnalysis>(F);
     auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
     readMemprof(M, F, MemProfReader.get(), TLI, FullStackIdToAllocMatchInfo,
-                MatchedCallSites, UndriftMaps, ORE);
+                MatchedCallSites, UndriftMaps, ORE, MaxColdSize);
   }
 
   if (ClPrintMemProfMatchInfo) {
diff --git a/llvm/test/ThinLTO/X86/memprof-report-hinted-partial.ll b/llvm/test/ThinLTO/X86/memprof-report-hinted-partial.ll
new file mode 100644
index 0000000000000..d4a3f9bca2cab
--- /dev/null
+++ b/llvm/test/ThinLTO/X86/memprof-report-hinted-partial.ll
@@ -0,0 +1,73 @@
+;; Test that we get hinted size reporting for just the subset of MIBs that
+;; contain context size info in the metadata.
+
+;; Generate the bitcode including ThinLTO summary. Specify
+;; -memprof-min-percent-max-cold-size (value doesn't matter) to indicate to
+;; the bitcode writer that it should expect and optimize for partial context
+;; size info.
+; RUN: opt -thinlto-bc -memprof-min-percent-max-cold-size=50 %s >%t.o
+
+; RUN: llvm-lto2 run %t.o -enable-memprof-context-disambiguation \
+; RUN:	-supports-hot-cold-new \
+; RUN:	-r=%t.o,main,plx \
+; RUN:	-r=%t.o,_Znam, \
+; RUN:	-memprof-report-hinted-sizes \
+; RUN:	-o %t.out 2>&1 | FileCheck %s --check-prefix=SIZES
+
+;; We should only get these two messages from -memprof-report-hinted-sizes
+;; as they are the only MIBs with recorded context size info.
+; SIZES-NOT: full allocation context
+; SIZES: Cold full allocation context 456 with total size 200 is Cold after cloning (context id 2)
+; SIZES: Cold full allocation context 789 with total size 300 is Cold after cloning (context id 2)
+; SIZES-NOT: full allocation context
+
+source_filename = "memprof-report-hinted-partial.ll"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define i32 @main() #0 {
+entry:
+  %call = call ptr @_Z3foov(), !callsite !0
+  %call1 = call ptr @_Z3foov(), !callsite !1
+  ret i32 0
+}
+
+define internal ptr @_Z3barv() #0 {
+entry:
+  %call = call ptr @_Znam(i64 0), !memprof !2, !callsite !7
+  ret ptr null
+}
+
+declare ptr @_Znam(i64)
+
+define internal ptr @_Z3bazv() #0 {
+entry:
+  %call = call ptr @_Z3barv(), !callsite !8
+  ret ptr null
+}
+
+define internal ptr @_Z3foov() #0 {
+entry:
+  %call = call ptr @_Z3bazv(), !callsite !9
+  ret ptr null
+}
+
+; uselistorder directives
+uselistorder ptr @_Z3foov, { 1, 0 }
+
+attributes #0 = { noinline optnone }
+
+!0 = !{i64 8632435727821051414}
+!1 = !{i64 -3421689549917153178}
+!2 = !{!3, !5, !13}
+!3 = !{!4, !"notcold"}
+!4 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 8632435727821051414}
+!5 = !{!6, !"cold", !11, !12}
+!6 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 -3421689549917153178}
+!7 = !{i64 9086428284934609951}
+!8 = !{i64 -5964873800580613432}
+!9 = !{i64 2732490490862098848}
+!11 = !{i64 456, i64 200}
+!12 = !{i64 789, i64 300}
+!13 = !{!14, !"cold"}
+!14 = !{i64 9086428284934609951, i64 12345}
diff --git a/llvm/test/Transforms/PGOProfile/memprof_max_cold_threshold.test b/llvm/test/Transforms/PGOProfile/memprof_max_cold_threshold.test
new file mode 100644
index 0000000000000..f96b3f0618102
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/memprof_max_cold_threshold.test
@@ -0,0 +1,131 @@
+;; Test the -memprof-min-percent-max-cold-size flag for annotating only the
+;; largest contexts with size info.
+
+; REQUIRES: x86_64-linux
+
+; RUN: split-file %s %t
+
+;; Specify version 4 so that a summary is generated.
+; RUN: llvm-profdata merge --memprof-version=4 %t/memprof_max_cold_threshold.yaml -o %t/memprof_max_cold_threshold.memprofdata
+
+; RUN: llvm-profdata show --memory %t/memprof_max_cold_threshold.memprofdata | FileCheck %s...
[truncated]

@@ -168,7 +187,8 @@ void CallStackTrie::addCallStack(MDNode *MIB) {
static MDNode *createMIBNode(LLVMContext &Ctx, ArrayRef<uint64_t> MIBCallStack,
AllocationType AllocType,
ArrayRef<ContextTotalSize> ContextSizeInfo,
uint64_t &TotalBytes, uint64_t &ColdBytes) {
uint64_t &TotalBytes, uint64_t &ColdBytes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange to have output params (Total bytes and ColdBytes) appear before MaxColdSize which is an input. Maybe change the order to have the inputs first? Also const for MaxColdSize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noticed another place in the existing code where this happens -- CallStackTrie::buildMIBNodes (in case you'd like to clean up after).

// 0s for the other MIBs. Handle the case where the first
// ContextSizeInfo being inserted is not for the first MIB, insert
// a pair of 0s for each of the prior MIBs.
if (ContextSizeInfos.empty() && !MIBs.empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to think of how to make it simpler, but I guess the key constraint is that we don't want to pay for the memory overhead of ContextSizeInfos unless there is at least one MIB which has this information?

If the overhead is low enough I think we can simplify this by moving L550 (ContextSizes def) out of the if-condition. Then based on whether it was filled in or not insert the value or zeros in step with MIBs on L577. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more than the memory overhead - it's also the summary bitcode overhead we want to avoid if the alloc doesn't have any MIBs with non-empty context size infos.

With your suggestion it sounds like we would still have an if then else here (push_back either the ContextSizes array or {{0,0}}). It would just be a little different in that we always insert the {{0,0}}. I suppose we could set a flag to true if ever appending a non-empty ContextSizes array, and use that check below to guard moving the ContextSizeInfos into the AllocInfo. But in that case I would probably not move the ContextSizes declaration out of this if body, I would just change the code to have an unconditional else at line 572 and remove the insertion of MIBs.size() 0s here. That sounds simpler, I'll go ahead and do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm.

;; Test the -memprof-min-percent-max-cold-size flag for annotating only the
;; largest contexts with size info.

; REQUIRES: x86_64-linux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this since we don't deal with any raw profiles in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just got copied over from the test I cloned this from initially. Agree, I will remove it.

Copy link
Contributor

@snehasish snehasish Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few of these I came across when running memprof tests on ARM for development. I'll clean up the rest.

Comment on lines 107 to 110
; ALL: ![[M5]] = !{![[C5:[0-9]+]], !"cold"
; EXCL75-NOT: !
; EXCL75-SAME: }
; INCL75-SAME: , ![[S5:[0-9]+]]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty hard to read and update in the future. I wonder if we can simplify this by having separate CHECKS on the RUN lines above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to be compact but agree it is harder to read. Will separate them all out.

@teresajohnson
Copy link
Contributor Author

I addressed the comments, PTAL

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

github-actions bot commented Jun 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@teresajohnson teresajohnson merged commit f2adae5 into llvm:main Jun 3, 2025
7 of 10 checks passed
teresajohnson added a commit that referenced this pull request Jun 3, 2025
…llocations" (#142688)

Reverts #142507 due to buildbot failures that I will
look into tomorrow.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 3, 2025
…gest cold allocations" (#142688)

Reverts llvm/llvm-project#142507 due to buildbot failures that I will
look into tomorrow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO) PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants