Skip to content

[InstrPGO] Support cold function coverage instrumentation #109837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -1787,6 +1787,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
PosFlag<SetTrue, [], [ClangOption, CC1Option],
"Emit extra debug info to make sample profile more accurate">,
NegFlag<SetFalse>>;
def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to expose this through the clang frontend?

Copy link
Contributor Author

@wlei-llvm wlei-llvm Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if my PR description is not clear. Note that there is no use for -fprofile-generate and -fprofile-instr-generate here, so a driver flag here is to configure the instr file path and make linker to link the compiler_rt.profile object files (just similar to -fprofile-[instr]-generate=).

The reason for not using -fprofile-[instr]-generate is because those flags conflict with -fprofile-sample-use, see PGOOptions, ProfileFile is a shared file path which means the two flags are actually mutually exclusive.

Another option is to make -fprofile-generate compatible with -fprofile-samle-use(like refactoring PGOOptions or adding another flag to configure the file path things), this seems to me they are more complicated than the current one. But I’m open to any suggestions on this.

Copy link
Member

@mtrofin mtrofin Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, why not just use clang ... -mllvm -instrument-cold-function-coverage? Is this a clang - only feature (i.e. rust can't use it?) Is it just for symmetry with the current PGO flags?

(This is not a pushback, mainly curious. Also the patch would be quite smaller if you didn't need to pass through the flags from clang to llvm)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, why not just use clang ... -mllvm -instrument-cold-function-coverage? Is this a clang - only feature (i.e. rust can't use it?) Is it just for symmetry with the current PGO flags?

(This is not a pushback, mainly curious. Also the patch would be quite smaller if you didn't need to pass through the flags from clang to llvm)

I see, thanks for the suggestion! We also need to link runtime lib(compiler_rt.profile.a)
but yeah, I agree it's possible to pass directly to llvm and linker without through clang. Then the full command line would be like

clang ... -mllvm -instrument-cold-function-coverage -mllvm -instrument-sample-cold-function-path=<file-path> -mllvm --pgo-function-entry-coverage 
ld.lld ... --u__llvm_runtime_variable  .../lib/x86_64-unknown-linux-gnu/libclang_rt.profile.a

So yes, adding the clang driver flag is kind of to mirror current IRPGO flags, for easy maintenance purpose(given that -fprofile-generate doesn't work with -fprofile-sample-use) and also to centralize the configuration for the convenience. IMO, the compiler_rt is probably the main blocker here, I didn't find an easy way to bundle it with a llvm flag.

Appreciate any further suggestions!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would -Wl,-lclang_rt.profile work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also to centralize the configuration for the convenience

+1 I think a frontend flag is useful. It allows us to identify incompatibilities early and provide clear error messages instead of more obscure failures down the line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would incompatibilities and ability to provide useful error messages be dependent on the flag being in the frontend? Is it that the backend can't actually reliably point the finger at the specific flag causing the conflict, so a message would be diluted to "sampling won't work with this"?

(probably a subject for a different forum tho)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would -Wl,-lclang_rt.profile work?

Got it, I think it should work, except it requires another linker flag: the --u__llvm_runtime_variable, we can configure it to linker too, but basically those instr PGO flags control addProfileRTLibs (which seems not equivalent to -Wl,-lclang_rt.profile), I just wanted to mirror those flags so that we don't need to maintain if anything changes to addProfileRTLibs in the future. (Though I feel this code should be very stable, should not be a big problem, so mainly for the convenience for compiler user to use one flag instead of using/maintaining multiple flags )
Overall, I agree that it's feasible to do it without clang flag. I'm not very familiar with the convention for adding driver flags, so if you think this doesn't justify it, I will drop it from this patch. Thanks for the discussion!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that the backend can't actually reliably point the finger at the specific flag causing the conflict

Yes, if we know that this instrumentation mode should not be mixed with e.g. sanitizers or something else we can enforce these checks early. I don't see a particular downside to adding a frontend flag. The convenience of bundling the 2 lld flags and 3 mllvm flags into a single frontend flag seems like a good enough motivation to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on single driver flag for convenience.

Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Generate instrumented code to collect coverage info for cold functions into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
def fprofile_generate_cold_function_coverage_EQ : Joined<["-"], "fprofile-generate-cold-function-coverage=">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<directory>">,
HelpText<"Generate instrumented code to collect coverage info for cold functions into <directory>/default.profraw (overridden by LLVM_PROFILE_FILE env var)">;
def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
Expand Down
4 changes: 3 additions & 1 deletion clang/lib/Driver/ToolChain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -897,7 +897,9 @@ bool ToolChain::needsProfileRT(const ArgList &Args) {
Args.hasArg(options::OPT_fprofile_instr_generate) ||
Args.hasArg(options::OPT_fprofile_instr_generate_EQ) ||
Args.hasArg(options::OPT_fcreate_profile) ||
Args.hasArg(options::OPT_forder_file_instrumentation);
Args.hasArg(options::OPT_forder_file_instrumentation) ||
Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage) ||
Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage_EQ);
}

bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) {
Expand Down
20 changes: 20 additions & 0 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -632,6 +632,26 @@ static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C,
}
}

if (auto *ColdFuncCoverageArg = Args.getLastArg(
options::OPT_fprofile_generate_cold_function_coverage,
options::OPT_fprofile_generate_cold_function_coverage_EQ)) {
SmallString<128> Path(
ColdFuncCoverageArg->getOption().matches(
options::OPT_fprofile_generate_cold_function_coverage_EQ)
? ColdFuncCoverageArg->getValue()
: "");
llvm::sys::path::append(Path, "default_%m.profraw");
// FIXME: Idealy the file path should be passed through
// `-fprofile-instrument-path=`(InstrProfileOutput), however, this field is
// shared with other profile use path(see PGOOptions), we need to refactor
// PGOOptions to make it work.
CmdArgs.push_back("-mllvm");
CmdArgs.push_back(Args.MakeArgString(
Twine("--instrument-cold-function-only-path=") + Path));
CmdArgs.push_back("-mllvm");
CmdArgs.push_back("--pgo-function-entry-coverage");
}

Arg *PGOGenArg = nullptr;
if (PGOGenerateArg) {
assert(!CSPGOGenerateArg);
Expand Down
19 changes: 19 additions & 0 deletions clang/test/CodeGen/pgo-cold-function-coverage.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// Test -fprofile-generate-cold-function-coverage

// RUN: rm -rf %t && split-file %s %t
// RUN: %clang -O2 -fprofile-generate-cold-function-coverage=/xxx/yyy/ -fprofile-sample-accurate -fprofile-sample-use=%t/pgo-cold-func.prof -S -emit-llvm -o - %t/pgo-cold-func.c | FileCheck %s

// CHECK: @__llvm_profile_filename = {{.*}} c"/xxx/yyy/default_%m.profraw\00"

// CHECK: store i8 0, ptr @__profc_bar, align 1
// CHECK-NOT: @__profc_foo

//--- pgo-cold-func.prof
foo:1:1
1: 1

//--- pgo-cold-func.c
int bar(int x) { return x;}
int foo(int x) {
return x;
}
8 changes: 8 additions & 0 deletions clang/test/Driver/fprofile-generate-cold-function-coverage.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// RUN: %clang -### -c -fprofile-generate-cold-function-coverage %s 2>&1 | FileCheck %s
// CHECK: "--instrument-cold-function-only-path=default_%m.profraw"
// CHECK: "--pgo-function-entry-coverage"
// CHECK-NOT: "-fprofile-instrument"
// CHECK-NOT: "-fprofile-instrument-path=

// RUN: %clang -### -c -fprofile-generate-cold-function-coverage=dir %s 2>&1 | FileCheck %s --check-prefix=CHECK-EQ
// CHECK-EQ: "--instrument-cold-function-only-path=dir{{/|\\\\}}default_%m.profraw"
17 changes: 16 additions & 1 deletion llvm/lib/Passes/PassBuilderPipelines.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,12 @@ static cl::opt<bool> UseLoopVersioningLICM(
"enable-loop-versioning-licm", cl::init(false), cl::Hidden,
cl::desc("Enable the experimental Loop Versioning LICM pass"));

static cl::opt<std::string> InstrumentColdFuncOnlyPath(
"instrument-cold-function-only-path", cl::init(""),
cl::desc("File path for cold function only instrumentation"), cl::Hidden);

extern cl::opt<std::string> UseCtxProfile;
extern cl::opt<bool> PGOInstrumentColdFunctionOnly;

namespace llvm {
extern cl::opt<bool> EnableMemProfContextDisambiguation;
Expand Down Expand Up @@ -1182,8 +1187,13 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
const bool IsCtxProfUse =
!UseCtxProfile.empty() && Phase == ThinOrFullLTOPhase::ThinLTOPreLink;

// Enable cold function coverage instrumentation if
// InstrumentColdFuncOnlyPath is provided.
const bool IsColdFuncOnlyInstrGen = PGOInstrumentColdFunctionOnly =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting this global variable here has introduced a data race detected by TSan when different threads set up different pass pipelines. Was this intentional? Did you mean to use == instead of =?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. It's intentional to set the global variable, I wasn't aware it will cause data race. I will try to change it to local one.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes TSAN tests failures all over our infrastructure in anything that uses LLVM jit, i.e. Tensorflow/XLA

One of the opens-source examples is https://github.com/google-deepmind/mujoco_menagerie/blob/main/test/model_test.py

Traces look like:

WARNING: ThreadSanitizer: data race (pid=4749)
  Write of size 1 at 0x7f83f3bd09f8 by thread T96 (mutexes: write M0):
    #0 setValue<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1401&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b) (BuildId: a0d2dfe7395d03fede80b0393e3acc55)
    #1 operator=<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1492](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1492&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b)
    #2 llvm::PassBuilder::buildModuleSimplificationPipeline(llvm::OptimizationLevel, llvm::ThinOrFullLTOPhase) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1192](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1192&ws=dmitryc/38224&snapshot=3):69 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b)
    #3 llvm::PassBuilder::buildPerModuleDefaultPipeline(llvm::OptimizationLevel, bool) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1620](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1620&ws=dmitryc/38224&snapshot=3):15 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2f1ef4) (BuildId: a0d2dfe7395d03fede80b0393e3acc55)
    #4 xla::cpu::CompilerFunctor::operator()(llvm::Module&) [third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc:181](https://cs.corp.google.com/piper///depot/google3/third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc?l=181&ws=dmitryc/38224&snapshot=3):19 (libthird_Uparty_Stensorflow_Scompiler_Sxla_Sservice_Scpu_Slibcompiler_Ufunctor.so+0xafde) (BuildId: 036848c7284f8ba35b41d1e7f52e58ff)
...

WARNING: ThreadSanitizer: data race (pid=4749)
  Write of size 1 at 0x7f83f3bd09f8 by thread T96 (mutexes: write M0):
    #0 setValue<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1401&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b) (BuildId: a0d2dfe7395d03fede80b0393e3acc55)
    #1 operator=<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1492](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1492&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b)
    #2 llvm::PassBuilder::buildModuleSimplificationPipeline(llvm::OptimizationLevel, llvm::ThinOrFullLTOPhase) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1192](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1192&ws=dmitryc/38224&snapshot=3):69 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b)
    #3 llvm::PassBuilder::buildPerModuleDefaultPipeline(llvm::OptimizationLevel, bool) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1620](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1620&ws=dmitryc/38224&snapshot=3):15 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2f1ef4) (BuildId: a0d2dfe7395d03fede80b0393e3acc55)
    #4 xla::cpu::CompilerFunctor::operator()(llvm::Module&) [third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc:181](https://cs.corp.google.com/piper///depot/google3/third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc?l=181&ws=dmitryc/38224&snapshot=3):19 (libthird_Uparty_Stensorflow_Scompiler_Sxla_Sservice_Scpu_Slibcompiler_Ufunctor.so+0xafde) (BuildId: 036848c7284f8ba35b41d1e7f52e58ff)
....

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but had to revert it in 06e28ed and d924a9b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed by #114364

IsPGOPreLink && !InstrumentColdFuncOnlyPath.empty();

if (IsPGOInstrGen || IsPGOInstrUse || IsMemprofUse || IsCtxProfGen ||
IsCtxProfUse)
IsCtxProfUse || IsColdFuncOnlyInstrGen)
addPreInlinerPasses(MPM, Level, Phase);

// Add all the requested passes for instrumentation PGO, if requested.
Expand All @@ -1205,6 +1215,11 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
return MPM;
addPostPGOLoopRotation(MPM, Level);
MPM.addPass(PGOCtxProfLoweringPass());
} else if (IsColdFuncOnlyInstrGen) {
addPGOInstrPasses(
MPM, Level, /* RunProfileGen */ true, /* IsCS */ false,
/* AtomicCounterUpdate */ false, InstrumentColdFuncOnlyPath,
/* ProfileRemappingFile */ "", IntrusiveRefCntPtr<vfs::FileSystem>());
}

if (IsPGOInstrGen || IsPGOInstrUse || IsCtxProfGen)
Expand Down
19 changes: 19 additions & 0 deletions llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,20 @@ static cl::opt<unsigned> PGOFunctionCriticalEdgeThreshold(
cl::desc("Do not instrument functions with the number of critical edges "
" greater than this threshold."));

static cl::opt<uint64_t> PGOColdInstrumentEntryThreshold(
"pgo-cold-instrument-entry-threshold", cl::init(0), cl::Hidden,
cl::desc("For cold function instrumentation, skip instrumenting functions "
"whose entry count is above the given value."));

static cl::opt<bool> PGOTreatUnknownAsCold(
"pgo-treat-unknown-as-cold", cl::init(false), cl::Hidden,
cl::desc("For cold function instrumentation, treat count unknown(e.g. "
"unprofiled) functions as cold."));

cl::opt<bool> PGOInstrumentColdFunctionOnly(
"pgo-instrument-cold-function-only", cl::init(false), cl::Hidden,
cl::desc("Enable cold function only instrumentation."));

extern cl::opt<unsigned> MaxNumVTableAnnotations;

namespace llvm {
Expand Down Expand Up @@ -1897,6 +1911,11 @@ static bool skipPGOGen(const Function &F) {
return true;
if (F.getInstructionCount() < PGOFunctionSizeThreshold)
return true;
if (PGOInstrumentColdFunctionOnly) {
if (auto EntryCount = F.getEntryCount())
return EntryCount->getCount() > PGOColdInstrumentEntryThreshold;
return !PGOTreatUnknownAsCold;
}
return false;
}

Expand Down
35 changes: 35 additions & 0 deletions llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
; RUN: opt < %s --passes=pgo-instr-gen -pgo-instrument-cold-function-only -pgo-function-entry-coverage -S | FileCheck --check-prefixes=COLD %s
; RUN: opt < %s --passes=pgo-instr-gen -pgo-instrument-cold-function-only -pgo-function-entry-coverage -pgo-cold-instrument-entry-threshold=1 -S | FileCheck --check-prefixes=ENTRY-COUNT %s
; RUN: opt < %s --passes=pgo-instr-gen -pgo-instrument-cold-function-only -pgo-function-entry-coverage -pgo-treat-unknown-as-cold -S | FileCheck --check-prefixes=UNKNOWN-FUNC %s

; COLD: call void @llvm.instrprof.cover(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)
; COLD-NOT: __profn_main
; COLD-NOT: __profn_bar

; ENTRY-COUNT: call void @llvm.instrprof.cover(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)
; ENTRY-COUNT: call void @llvm.instrprof.cover(ptr @__profn_main, i64 [[#]], i32 1, i32 0)

; UNKNOWN-FUNC: call void @llvm.instrprof.cover(ptr @__profn_bar, i64 [[#]], i32 1, i32 0)
; UNKNOWN-FUNC: call void @llvm.instrprof.cover(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)


target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @bar() {
entry:
ret void
}

define void @foo() !prof !0 {
entry:
ret void
}

define i32 @main() !prof !1 {
entry:
ret i32 0
}

!0 = !{!"function_entry_count", i64 0}
!1 = !{!"function_entry_count", i64 1}
Loading