Skip to content

Commit

Permalink
[MemProf] Allow promotion if target is a declaration (#115555)
Browse files Browse the repository at this point in the history
Fixes an oversight in the MemProf ICP handling, that was blocking
promotion/cloning of indirect calls when the profiled target is a
declaration (i.e wasn't imported). There is no issue promoting in
that case, and in fact the comment mentions we should attempt to at
least import as declarations to enable more promotion.

Note that normal ICP currently requires that the target be a definition,
which is how this check ended up here. The comment there says that it
must be a definition because ThinLTO could remove declarations for
symbols found to be globally dead in the binary. However, here we are
always performing MemProf ICP in the ThinLTO backends, which is after
the globally dead symbols are removed (via dropDeadSymbols before
starting the optimization pipeline) [1].

For now, guard this with an option (flag is off which means the new
promotion is enabled by default) to simplify debugging or disabling it
if
this proves problematic.

[1] In fact we could also be more aggressive in regular ICP when invoked
in the ThinLTO backend
  • Loading branch information
teresajohnson authored Nov 9, 2024
1 parent 1d41543 commit 3654183
Show file tree
Hide file tree
Showing 2 changed files with 122 additions and 25 deletions.
13 changes: 12 additions & 1 deletion llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,11 @@ cl::opt<bool> EnableMemProfContextDisambiguation(
cl::opt<bool> SupportsHotColdNew(
"supports-hot-cold-new", cl::init(false), cl::Hidden,
cl::desc("Linking with hot/cold operator new interfaces"));

cl::opt<bool> MemProfRequireDefinitionForPromotion(
"memprof-require-definition-for-promotion", cl::init(false), cl::Hidden,
cl::desc(
"Require target function definition when promoting indirect calls"));
} // namespace llvm

extern cl::opt<bool> MemProfReportHintedSizes;
Expand Down Expand Up @@ -4602,7 +4607,13 @@ void MemProfContextDisambiguation::performICP(
// target (or version of the code), and we need to be conservative
// (similar to what is done in the ICP pass).
Function *TargetFunction = Symtab->getFunction(Candidate.Value);
if (TargetFunction == nullptr || TargetFunction->isDeclaration()) {
if (TargetFunction == nullptr ||
// Any ThinLTO global dead symbol removal should have already
// occurred, so it should be safe to promote when the target is a
// declaration.
// TODO: Remove internal option once more fully tested.
(MemProfRequireDefinitionForPromotion &&
TargetFunction->isDeclaration())) {
ORE.emit([&]() {
return OptimizationRemarkMissed(DEBUG_TYPE, "UnableToFindTarget", CB)
<< "Memprof cannot promote indirect call: target with md5sum "
Expand Down
134 changes: 110 additions & 24 deletions llvm/test/ThinLTO/X86/memprof-icp.ll
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@
; RUN: -r=%t/foo.o,_Z3fooR2B0j,plx \
; RUN: -r=%t/foo.o,_ZN2B03barEj.abc,plx \
; RUN: -r=%t/foo.o,_Z3xyzR2B0j, \
; RUN: -r=%t/foo.o,_ZN2B03barEj, \
; RUN: -r=%t/foo.o,_ZN1B3barEj, \
; RUN: -r=%t/main.o,_Z3fooR2B0j, \
; RUN: -r=%t/main.o,_Znwm, \
; RUN: -r=%t/main.o,_ZdlPvm, \
Expand All @@ -113,9 +115,9 @@
; RUN: -pass-remarks=. -save-temps \
; RUN: -o %t.out 2>&1 | FileCheck %s --check-prefix=STATS \
; RUN: --check-prefix=STATS-BE --check-prefix=REMARKS-MAIN \
; RUN: --check-prefix=REMARKS-FOO
; RUN: --check-prefix=REMARKS-FOO --check-prefix=REMARKS-FOO-IMPORT

; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR
; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR --check-prefix=IR-IMPORT

;; Try again but with distributed ThinLTO
; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
Expand All @@ -124,6 +126,8 @@
; RUN: -r=%t/foo.o,_Z3fooR2B0j,plx \
; RUN: -r=%t/foo.o,_ZN2B03barEj.abc,plx \
; RUN: -r=%t/foo.o,_Z3xyzR2B0j, \
; RUN: -r=%t/foo.o,_ZN2B03barEj, \
; RUN: -r=%t/foo.o,_ZN1B3barEj, \
; RUN: -r=%t/main.o,_Z3fooR2B0j, \
; RUN: -r=%t/main.o,_Znwm, \
; RUN: -r=%t/main.o,_ZdlPvm, \
Expand All @@ -147,8 +151,9 @@
; RUN: -enable-memprof-indirect-call-support=true \
; RUN: -summary-file=%t/foo.o.thinlto.bc -memprof-import-summary=%t/foo.o.thinlto.bc \
; RUN: -enable-import-metadata -stats -pass-remarks=. \
; RUN: %t/foo.o -S 2>&1 | FileCheck %s --check-prefix=IR \
; RUN: --check-prefix=STATS-BE-DISTRIB --check-prefix=REMARKS-FOO
; RUN: %t/foo.o -S 2>&1 | FileCheck %s --check-prefix=IR --check-prefix=IR-IMPORT \
; RUN: --check-prefix=STATS-BE-DISTRIB --check-prefix=REMARKS-FOO \
; RUN: --check-prefix=REMARKS-FOO-IMPORT

;; Retry with the ICP-disabled object file, and make sure we disable it again
;; so we don't look for the synthesized callsite records when applying imports.
Expand All @@ -159,6 +164,8 @@
; RUN: -r=%t/foo.noicp.o,_Z3fooR2B0j,plx \
; RUN: -r=%t/foo.noicp.o,_ZN2B03barEj.abc,plx \
; RUN: -r=%t/foo.noicp.o,_Z3xyzR2B0j, \
; RUN: -r=%t/foo.noicp.o,_ZN2B03barEj, \
; RUN: -r=%t/foo.noicp.o,_ZN1B3barEj, \
; RUN: -r=%t/main.o,_Z3fooR2B0j, \
; RUN: -r=%t/main.o,_Znwm, \
; RUN: -r=%t/main.o,_ZdlPvm, \
Expand All @@ -184,6 +191,74 @@
;; metadata.
; RUN: llvm-dis %t.noicp.out.2.4.opt.bc -o - | FileCheck %s --implicit-check-not "_Z3fooR2B0j.memprof" --implicit-check-not "!callsite"

;; Run in-process ThinLTO again, but with importing disabled by setting the
;; instruction limit to 0. Ensure that the existing declarations of B::bar
;; and B0::bar are sufficient to allow for the promotion and cloning.
; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
; RUN: -import-instr-limit=0 \
; RUN: -enable-memprof-indirect-call-support=true \
; RUN: -supports-hot-cold-new \
; RUN: -r=%t/foo.o,_Z3fooR2B0j,plx \
; RUN: -r=%t/foo.o,_ZN2B03barEj.abc,plx \
; RUN: -r=%t/foo.o,_Z3xyzR2B0j, \
; RUN: -r=%t/foo.o,_ZN2B03barEj, \
; RUN: -r=%t/foo.o,_ZN1B3barEj, \
; RUN: -r=%t/main.o,_Z3fooR2B0j, \
; RUN: -r=%t/main.o,_Znwm, \
; RUN: -r=%t/main.o,_ZdlPvm, \
; RUN: -r=%t/main.o,_Z8externalPi, \
; RUN: -r=%t/main.o,main,plx \
; RUN: -r=%t/main.o,_ZN2B03barEj,plx \
; RUN: -r=%t/main.o,_ZN1B3barEj,plx \
; RUN: -r=%t/main.o,_ZTV1B,plx \
; RUN: -r=%t/main.o,_ZTVN10__cxxabiv120__si_class_type_infoE,plx \
; RUN: -r=%t/main.o,_ZTS1B,plx \
; RUN: -r=%t/main.o,_ZTVN10__cxxabiv117__class_type_infoE,plx \
; RUN: -r=%t/main.o,_ZTS2B0,plx \
; RUN: -r=%t/main.o,_ZTI2B0,plx \
; RUN: -r=%t/main.o,_ZTI1B,plx \
; RUN: -r=%t/main.o,_ZTV2B0,plx \
; RUN: -thinlto-threads=1 \
; RUN: -memprof-verify-ccg -memprof-verify-nodes -stats \
; RUN: -pass-remarks=. -save-temps \
; RUN: -o %t.out 2>&1 | FileCheck %s --check-prefix=STATS \
; RUN: --check-prefix=STATS-BE-NOIMPORT --check-prefix=REMARKS-MAIN \
; RUN: --check-prefix=REMARKS-FOO

; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR --check-prefix=IR-NOIMPORT

;; Run it gain but with -memprof-require-definition-for-promotion, and confirm
;; that no promotions occur.
; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
; RUN: -import-instr-limit=0 \
; RUN: -memprof-require-definition-for-promotion \
; RUN: -enable-memprof-indirect-call-support=true \
; RUN: -supports-hot-cold-new \
; RUN: -r=%t/foo.o,_Z3fooR2B0j,plx \
; RUN: -r=%t/foo.o,_ZN2B03barEj.abc,plx \
; RUN: -r=%t/foo.o,_Z3xyzR2B0j, \
; RUN: -r=%t/foo.o,_ZN2B03barEj, \
; RUN: -r=%t/foo.o,_ZN1B3barEj, \
; RUN: -r=%t/main.o,_Z3fooR2B0j, \
; RUN: -r=%t/main.o,_Znwm, \
; RUN: -r=%t/main.o,_ZdlPvm, \
; RUN: -r=%t/main.o,_Z8externalPi, \
; RUN: -r=%t/main.o,main,plx \
; RUN: -r=%t/main.o,_ZN2B03barEj,plx \
; RUN: -r=%t/main.o,_ZN1B3barEj,plx \
; RUN: -r=%t/main.o,_ZTV1B,plx \
; RUN: -r=%t/main.o,_ZTVN10__cxxabiv120__si_class_type_infoE,plx \
; RUN: -r=%t/main.o,_ZTS1B,plx \
; RUN: -r=%t/main.o,_ZTVN10__cxxabiv117__class_type_infoE,plx \
; RUN: -r=%t/main.o,_ZTS2B0,plx \
; RUN: -r=%t/main.o,_ZTI2B0,plx \
; RUN: -r=%t/main.o,_ZTI1B,plx \
; RUN: -r=%t/main.o,_ZTV2B0,plx \
; RUN: -thinlto-threads=1 \
; RUN: -memprof-verify-ccg -memprof-verify-nodes \
; RUN: -pass-remarks=. \
; RUN: -o %t.out 2>&1 | FileCheck %s --implicit-check-not Promote

; REMARKS-MAIN: call in clone main assigned to call function clone _Z3fooR2B0j.memprof.1
; REMARKS-MAIN: call in clone main assigned to call function clone _Z3fooR2B0j.memprof.1
; REMARKS-MAIN: created clone _ZN2B03barEj.memprof.1
Expand All @@ -208,51 +283,59 @@
; REMARKS-FOO: call in clone _Z3fooR2B0j promoted and assigned to call function clone _ZN2B03barEj
; REMARKS-FOO: Promote indirect call to _ZN2B03barEj with count 2 out of 2
; REMARKS-FOO: call in clone _Z3fooR2B0j.memprof.1 promoted and assigned to call function clone _ZN2B03barEj.memprof.1
; REMARKS-FOO: created clone _ZN2B03barEj.memprof.1
; REMARKS-FOO: call in clone _ZN2B03barEj marked with memprof allocation attribute notcold
; REMARKS-FOO: call in clone _ZN2B03barEj.memprof.1 marked with memprof allocation attribute cold
; REMARKS-FOO: created clone _ZN1B3barEj.memprof.1
; REMARKS-FOO: call in clone _ZN1B3barEj marked with memprof allocation attribute notcold
; REMARKS-FOO: call in clone _ZN1B3barEj.memprof.1 marked with memprof allocation attribute cold
; REMARKS-FOO-IMPORT: created clone _ZN2B03barEj.memprof.1
; REMARKS-FOO-IMPORT: call in clone _ZN2B03barEj marked with memprof allocation attribute notcold
; REMARKS-FOO-IMPORT: call in clone _ZN2B03barEj.memprof.1 marked with memprof allocation attribute cold
; REMARKS-FOO-IMPORT: created clone _ZN1B3barEj.memprof.1
; REMARKS-FOO-IMPORT: call in clone _ZN1B3barEj marked with memprof allocation attribute notcold
; REMARKS-FOO-IMPORT: call in clone _ZN1B3barEj.memprof.1 marked with memprof allocation attribute cold

; STATS: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during whole program analysis
; STATS-BE: 8 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
; STATS-BE-NOIMPORT: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
; STATS: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during whole program analysis
; STATS-BE: 8 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
; STATS-BE-NOIMPORT: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
; STATS: 3 memprof-context-disambiguation - Number of function clones created during whole program analysis
; STATS-BE: 5 memprof-context-disambiguation - Number of function clones created during ThinLTO backend
; STATS-BE-NOIMPORT: 3 memprof-context-disambiguation - Number of function clones created during ThinLTO backend

; IR-NOIMPORT: foo
; IR: define {{.*}} @_Z3fooR2B0j(
; IR: %1 = icmp eq ptr %0, @_ZN1B3barEj
; IR: br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect
; IR: %[[R1:[0-9]+]] = icmp eq ptr %0, @_ZN1B3barEj
; IR: br i1 %[[R1]], label %if.true.direct_targ, label %if.false.orig_indirect
; IR: if.true.direct_targ:
; IR: call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD:[0-9]+]]
; IR-IMPORT: call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD:[0-9]+]]
; IR-NOIMPORT: call {{.*}} @_ZN1B3barEj(
; IR: if.false.orig_indirect:
; IR: %2 = icmp eq ptr %0, @_ZN2B03barEj
; IR: br i1 %2, label %if.true.direct_targ1, label %if.false.orig_indirect2
; IR: %[[R2:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
; IR: br i1 %[[R2]], label %if.true.direct_targ1, label %if.false.orig_indirect2
; IR: if.true.direct_targ1:
; IR: call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD]]
; IR-IMPORT: call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD]]
; IR-NOIMPORT: call {{.*}} @_ZN2B03barEj(
; IR: if.false.orig_indirect2:
; IR: call {{.*}} %0

; IR: define {{.*}} @_Z3fooR2B0j.memprof.1(
;; We should still compare against the original versions of bar since that is
;; what is in the vtable. However, we should have called the cloned versions
;; that perform cold allocations, which were subsequently inlined.
; IR: %1 = icmp eq ptr %0, @_ZN1B3barEj
; IR: br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect
; IR: %[[R3:[0-9]+]] = icmp eq ptr %0, @_ZN1B3barEj
; IR: br i1 %[[R3]], label %if.true.direct_targ, label %if.false.orig_indirect
; IR: if.true.direct_targ:
; IR: call {{.*}} @_Znwm(i64 noundef 4) #[[COLD:[0-9]+]]
; IR-IMPORT: call {{.*}} @_Znwm(i64 noundef 4) #[[COLD:[0-9]+]]
; IR-NOIMPORT: call {{.*}} @_ZN1B3barEj.memprof.1(
; IR: if.false.orig_indirect:
; IR: %2 = icmp eq ptr %0, @_ZN2B03barEj
; IR: br i1 %2, label %if.true.direct_targ1, label %if.false.orig_indirect2
; IR: %[[R4:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
; IR: br i1 %[[R4]], label %if.true.direct_targ1, label %if.false.orig_indirect2
; IR: if.true.direct_targ1:
; IR: call {{.*}} @_Znwm(i64 noundef 4) #[[COLD]]
; IR-IMPORT: call {{.*}} @_Znwm(i64 noundef 4) #[[COLD]]
; IR-NOIMPORT: call {{.*}} @_ZN2B03barEj.memprof.1(
; IR: if.false.orig_indirect2:
; IR: call {{.*}} %0

; IR: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
; IR: attributes #[[COLD]] = {{.*}} "memprof"="cold"
; IR-IMPORT: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
; IR-IMPORT: attributes #[[COLD]] = {{.*}} "memprof"="cold"

; STATS-BE-DISTRIB: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
; STATS-BE-DISTRIB: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
Expand All @@ -272,6 +355,9 @@ define i32 @_ZN2B03barEj.abc(ptr %this, i32 %s) {
ret i32 0
}

declare i32 @_ZN2B03barEj(ptr %this, i32 %s)
declare i32 @_ZN1B3barEj(ptr %this, i32 %s)

define i32 @_Z3fooR2B0j(ptr %b) {
entry:
%0 = load ptr, ptr %b, align 8
Expand Down

0 comments on commit 3654183

Please sign in to comment.