-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[ctxprof] don't inline weak symbols after instrumentation #128811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ctxprof] don't inline weak symbols after instrumentation #128811
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
20b7123
to
eda0f33
Compare
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-llvm-transforms Author: Mircea Trofin (mtrofin) ChangesContextual profiling identifies functions by GUID. Functions that may get overridden by the linker with a prevailing copy may have, during instrumentation, different variants in different modules. If these variants get inlined before linking (here I assume thinlto), they will identify themselves to the ctxprof runtime as their GUID, leading to issues - they may have different counter counts, for instance. If we block their inlining in the pre-thinlink compilation, only the prevailing copy will survive post-thinlink and the confusion is avoided. The change introduces a small pass just for this purpose, which marks any symbols that could be affected by the above as We could later (different patch) choose to mark them back to their original attribute (none or Full diff: https://github.com/llvm/llvm-project/pull/128811.diff 6 Files Affected:
diff --git a/llvm/include/llvm/IR/GlobalValue.h b/llvm/include/llvm/IR/GlobalValue.h
index 2176e2c2cfbfc..c1d3cee636981 100644
--- a/llvm/include/llvm/IR/GlobalValue.h
+++ b/llvm/include/llvm/IR/GlobalValue.h
@@ -550,6 +550,13 @@ class GlobalValue : public Constant {
return isDiscardableIfUnused(getLinkage());
}
+ // the symbol in this module may be replaced by a prevailing copy.
+ bool mayBeReplacedByPrevailingCopy() const {
+ return getLinkage() != GlobalValue::ExternalLinkage &&
+ getLinkage() != GlobalValue::InternalLinkage &&
+ getLinkage() != GlobalValue::PrivateLinkage;
+ }
+
bool isWeakForLinker() const { return isWeakForLinker(getLinkage()); }
protected:
diff --git a/llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfLowering.h b/llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfLowering.h
index f127d16b8de12..8010c1c091e40 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfLowering.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfLowering.h
@@ -24,5 +24,20 @@ class PGOCtxProfLoweringPass : public PassInfoMixin<PGOCtxProfLoweringPass> {
PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
};
+
+// Utility pass blocking inlining for any function that may be overridden during
+// linking by a prevailing copy.
+// This avoids confusingly collecting profiles for the same GUID corresponding
+// to different variants of the function. We could do like PGO and identify
+// functions by a (GUID, Hash) tuple, but since the ctxprof "use" waits for
+// thinlto to happen before performing any further optimizations, it's
+// unnecessary to collect profiles for non-prevailing copies.
+class NoinlineNonPrevailing : public PassInfoMixin<NoinlineNonPrevailing> {
+public:
+ explicit NoinlineNonPrevailing() = default;
+
+ PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
+};
+
} // namespace llvm
#endif
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 6ca2f90aa0668..f0541dbfe3597 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1249,6 +1249,13 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
MPM.addPass(AssignGUIDPass());
if (IsCtxProfUse)
return MPM;
+ // Block further inlining in the instrumented ctxprof case. This avoids
+ // confusingly collecting profiles for the same GUID corresponding to
+ // different variants of the function. We could do like PGO and identify
+ // functions by a (GUID, Hash) tuple, but since the ctxprof "use" waits for
+ // thinlto to happen before performing any further optimizations, it's
+ // unnecessary to collect profiles for non-prevailing copies.
+ MPM.addPass(NoinlineNonPrevailing());
addPostPGOLoopRotation(MPM, Level);
MPM.addPass(PGOCtxProfLoweringPass());
} else if (IsColdFuncOnlyInstrGen) {
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index a664d6fd7085f..bfd952df25e98 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -62,6 +62,7 @@ MODULE_PASS("cross-dso-cfi", CrossDSOCFIPass())
MODULE_PASS("ctx-instr-gen",
PGOInstrumentationGen(PGOInstrumentationType::CTXPROF))
MODULE_PASS("ctx-prof-flatten", PGOCtxProfFlatteningPass())
+MODULE_PASS("noinline-nonprevailing", NoinlineNonPrevailing())
MODULE_PASS("deadargelim", DeadArgumentEliminationPass())
MODULE_PASS("debugify", NewPMDebugifyPass())
MODULE_PASS("dfsan", DataFlowSanitizerPass())
diff --git a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
index e7b7c26c493e5..19d899b08150e 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
@@ -351,3 +351,23 @@ bool CtxInstrumentationLowerer::lowerFunction(Function &F) {
F.getName());
return true;
}
+
+PreservedAnalyses NoinlineNonPrevailing::run(Module &M,
+ ModuleAnalysisManager &MAM) {
+ bool Changed = false;
+ for (auto &F : M) {
+ if (F.isDeclaration())
+ continue;
+ if (F.hasFnAttribute(Attribute::NoInline))
+ continue;
+ if (F.mayBeReplacedByPrevailingCopy()) {
+ F.addFnAttr(Attribute::NoInline);
+ if (F.hasFnAttribute(Attribute::AlwaysInline))
+ F.removeFnAttr(Attribute::AlwaysInline);
+ Changed = true;
+ }
+ }
+ if (Changed)
+ return PreservedAnalyses::none();
+ return PreservedAnalyses::all();
+}
diff --git a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-block-inline.ll b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-block-inline.ll
new file mode 100644
index 0000000000000..b455f327bf5a6
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-block-inline.ll
@@ -0,0 +1,25 @@
+; RUN: opt -passes=noinline-nonprevailing -S < %s 2>&1 | FileCheck %s
+
+define void @a() {
+ ret void
+}
+
+define void @b() #0 {
+ ret void
+}
+
+define weak_odr void @c() {
+ ret void
+}
+
+define weak_odr void @d() #0{
+ ret void
+}
+
+attributes #0 = { alwaysinline }
+
+; CHECK: void @a() {
+; CHECK: void @b() #0
+; CHECK: void @c() #1
+; CHECK: void @d() #1
+; CHECK: attributes #1 = { noinline }
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the performance hit for instrumentation as this seems like it could block a lot of linkonce_odr inlining of common functions in headers?
It doesn't appear to take a hit. Not completely surprising: we already do some inlining before instrumentation. If we ever want to avoid taking any hit, we can subsequently add some metadata to each such function capturing the original (if any) inline attribute; go through thinlto; then restore the original attribute from metadata after function import and let the backend inliner do its job (which, at this point, would be only working with prevailing copies) |
The last part isn't quite true - non-prevailing linkonce_odr are converted to available_externally and kept around long enough to allow them to be inlined. |
eda0f33
to
053d62c
Compare
(from offline chat) indeed, we may need to do something more nuanced if we want to re-open inline-ability in the post-thinlink step for the instrumented build - but TBD if/when. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with a minor suggestion below
053d62c
to
19d5ade
Compare
…ak_symbols_after_instrumentation
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/32/builds/13630 Here is the relevant piece of the build log for the reference
|
Contextual profiling identifies functions by GUID. Functions that may get overridden by the linker with a prevailing copy may have, during instrumentation, different variants in different modules. If these variants get inlined before linking (here I assume thinlto), they will identify themselves to the ctxprof runtime as their GUID, leading to issues - they may have different counter counts, for instance.
If we block their inlining in the pre-thinlink compilation, only the prevailing copy will survive post-thinlink and the confusion is avoided.
The change introduces a small pass just for this purpose, which marks any symbols that could be affected by the above as
noinline
(even if they werealwaysinline
). We already carried out some inlining (via the preinliner), before instrumenting, so technically thealwaysinline
directives were honored.We could later (different patch) choose to mark them back to their original attribute (none or
alwaysinline
) post-thinlink, if we want to - but experimentally that doesn't really change much of the performance of the instrumented binary.