-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Add initial support for SPE brstack format #129231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-bolt Author: Ádám Kallai (kaadam) ChangesPerf will be able to report SPE branch events as similar as it does with LBR brstack. Example of the SPE brstack input format: perf script -i perf.data -F pid,brstack --itrace=bl
16984 0x72e342e5f4/0x72e36192d0/M/-/-/11/RET/- SPE brstack mispredicted flag might be two characters long: 'PN' or 'MN'. Where 'N' means the branch was marked as NOT-TAKEN. This event is only related to conditional instruction (conditional branch or compare-and-branch), it tells that failed its condition code check. Perf with 'brstack' support for SPE is available here:
Example of useage with SPE perf data: perf2bolt -p perf.data -o perf.fdata --spe BINARY Capture standard SPE branch events with perf: perf record -e 'arm_spe_0/branch_filter=1/u' -- BINARY An unittest is also added to check parsing process of 'SPE brstack format'. Full diff: https://github.com/llvm/llvm-project/pull/129231.diff 3 Files Affected:
diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp
index cce9fdbef99bd..4af3a493b8be6 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -49,12 +49,10 @@ static cl::opt<bool>
cl::desc("aggregate basic samples (without LBR info)"),
cl::cat(AggregatorCategory));
-cl::opt<bool> ArmSPE(
- "spe",
- cl::desc(
- "Enable Arm SPE mode. Used in conjuction with no-lbr mode, ie `--spe "
- "--nl`"),
- cl::cat(AggregatorCategory));
+cl::opt<bool> ArmSPE("spe",
+ cl::desc("Enable Arm SPE mode. Can combine with `--nl` "
+ "to use in no-lbr mode"),
+ cl::cat(AggregatorCategory));
static cl::opt<std::string>
ITraceAggregation("itrace",
@@ -180,13 +178,16 @@ void DataAggregator::start() {
if (opts::ArmSPE) {
if (!opts::BasicAggregation) {
- errs() << "PERF2BOLT-ERROR: Arm SPE mode is combined only with "
- "BasicAggregation.\n";
- exit(1);
+ // pid from_ip to_ip predicted?
+ // 12345 0x123/0x456/P/-/-/8/RET/-
+ launchPerfProcess("SPE branch events", MainEventsPPI,
+ "script -F pid,brstack --itrace=bl",
+ /*Wait = */ false);
+ } else {
+ launchPerfProcess("SPE brstack events", MainEventsPPI,
+ "script -F pid,event,ip,addr --itrace=i1i",
+ /*Wait = */ false);
}
- launchPerfProcess("branch events with SPE", MainEventsPPI,
- "script -F pid,event,ip,addr --itrace=i1i",
- /*Wait = */ false);
} else if (opts::BasicAggregation) {
launchPerfProcess("events without LBR", MainEventsPPI,
"script -F pid,event,ip",
@@ -527,8 +528,7 @@ Error DataAggregator::preprocessProfile(BinaryContext &BC) {
}
exit(0);
}
-
- if (((!opts::BasicAggregation && !opts::ArmSPE) && parseBranchEvents()) ||
+ if ((!opts::BasicAggregation && parseBranchEvents()) ||
(opts::BasicAggregation && opts::ArmSPE && parseSpeAsBasicEvents()) ||
(opts::BasicAggregation && parseBasicEvents()))
errs() << "PERF2BOLT: failed to parse samples\n";
@@ -1034,7 +1034,11 @@ ErrorOr<LBREntry> DataAggregator::parseLBREntry() {
if (std::error_code EC = MispredStrRes.getError())
return EC;
StringRef MispredStr = MispredStrRes.get();
- if (MispredStr.size() != 1 ||
+ // SPE brstack mispredicted flags might be two characters long: 'PN' or 'MN'.
+ bool ProperStrSize = (MispredStr.size() == 2 && opts::ArmSPE)
+ ? (MispredStr[1] == 'N')
+ : (MispredStr.size() == 1);
+ if (!ProperStrSize ||
(MispredStr[0] != 'P' && MispredStr[0] != 'M' && MispredStr[0] != '-')) {
reportError("expected single char for mispred bit");
Diag << "Found: " << MispredStr << "\n";
@@ -1565,9 +1569,11 @@ uint64_t DataAggregator::parseLBRSample(const PerfBranchSample &Sample,
}
std::error_code DataAggregator::parseBranchEvents() {
- outs() << "PERF2BOLT: parse branch events...\n";
- NamedRegionTimer T("parseBranch", "Parsing branch events", TimerGroupName,
- TimerGroupDesc, opts::TimeAggregator);
+ std::string BranchEventTypeStr =
+ opts::ArmSPE ? "branch events" : "SPE branch events in LBR-format";
+ outs() << "PERF2BOLT: " << BranchEventTypeStr << "...\n";
+ NamedRegionTimer T("parseBranch", "Parsing " + BranchEventTypeStr,
+ TimerGroupName, TimerGroupDesc, opts::TimeAggregator);
uint64_t NumTotalSamples = 0;
uint64_t NumEntries = 0;
@@ -1595,7 +1601,8 @@ std::error_code DataAggregator::parseBranchEvents() {
}
NumEntries += Sample.LBR.size();
- if (BAT && Sample.LBR.size() == 32 && !NeedsSkylakeFix) {
+ if (this->BC->isX86() && BAT && Sample.LBR.size() == 32 &&
+ !NeedsSkylakeFix) {
errs() << "PERF2BOLT-WARNING: using Intel Skylake bug workaround\n";
NeedsSkylakeFix = true;
}
@@ -1630,10 +1637,17 @@ std::error_code DataAggregator::parseBranchEvents() {
if (NumSamples && NumSamplesNoLBR == NumSamples) {
// Note: we don't know if perf2bolt is being used to parse memory samples
// at this point. In this case, it is OK to parse zero LBRs.
- errs() << "PERF2BOLT-WARNING: all recorded samples for this binary lack "
- "LBR. Record profile with perf record -j any or run perf2bolt "
- "in no-LBR mode with -nl (the performance improvement in -nl "
- "mode may be limited)\n";
+ if (!opts::ArmSPE)
+ errs()
+ << "PERF2BOLT-WARNING: all recorded samples for this binary lack "
+ "LBR. Record profile with perf record -j any or run perf2bolt "
+ "in no-LBR mode with -nl (the performance improvement in -nl "
+ "mode may be limited)\n";
+ else
+ errs()
+ << "PERF2BOLT-WARNING: all recorded samples for this binary lack "
+ "SPE brstack entries. Record profile with:"
+ "perf record arm_spe_0/branch_filter=1/";
} else {
const uint64_t IgnoredSamples = NumTotalSamples - NumSamples;
const float PercentIgnored = 100.0f * IgnoredSamples / NumTotalSamples;
diff --git a/bolt/test/perf2bolt/AArch64/perf2bolt-spe.test b/bolt/test/perf2bolt/AArch64/perf2bolt-spe.test
index d7cea7ff769b8..d34a2c7994f72 100644
--- a/bolt/test/perf2bolt/AArch64/perf2bolt-spe.test
+++ b/bolt/test/perf2bolt/AArch64/perf2bolt-spe.test
@@ -11,4 +11,4 @@ CHECK-SPE-NO-LBR: PERF2BOLT: Starting data aggregation job
RUN: perf record -e cycles -q -o %t.perf.data -- %t.exe
RUN: not perf2bolt -p %t.perf.data -o %t.perf.boltdata --spe %t.exe 2>&1 | FileCheck %s --check-prefix=CHECK-SPE-LBR
-CHECK-SPE-LBR: PERF2BOLT-ERROR: Arm SPE mode is combined only with BasicAggregation.
+CHECK-SPE-LBR: PERF2BOLT: spawning perf job to read SPE branch events
diff --git a/bolt/unittests/Profile/PerfSpeEvents.cpp b/bolt/unittests/Profile/PerfSpeEvents.cpp
index e52393b516fa3..448354b784f29 100644
--- a/bolt/unittests/Profile/PerfSpeEvents.cpp
+++ b/bolt/unittests/Profile/PerfSpeEvents.cpp
@@ -23,6 +23,7 @@ using namespace llvm::ELF;
namespace opts {
extern cl::opt<std::string> ReadPerfEvents;
+extern cl::opt<bool> ArmSPE;
} // namespace opts
namespace llvm {
@@ -88,6 +89,45 @@ struct PerfSpeEventsTestHelper : public testing::Test {
return SampleSize == DA.BasicSamples.size();
}
+
+ /// Compare LBREntries
+ bool checkLBREntry(const LBREntry &Lhs, const LBREntry &Rhs) {
+ return Lhs.From == Rhs.From && Lhs.To == Rhs.To &&
+ Lhs.Mispred == Rhs.Mispred;
+ }
+
+ /// Parse and check SPE brstack as LBR
+ void parseAndCheckBrstackEvents(
+ uint64_t PID,
+ const std::vector<SmallVector<LBREntry, 2>> &ExpectedSamples) {
+ int NumSamples = 0;
+
+ DataAggregator DA("<pseudo input>");
+ DA.ParsingBuf = opts::ReadPerfEvents;
+ DA.BC = BC.get();
+ DataAggregator::MMapInfo MMap;
+ DA.BinaryMMapInfo.insert(std::make_pair(PID, MMap));
+
+ // Process buffer.
+ while (DA.hasData()) {
+ ErrorOr<DataAggregator::PerfBranchSample> SampleRes =
+ DA.parseBranchSample();
+ if (std::error_code EC = SampleRes.getError())
+ EXPECT_NE(EC, std::errc::no_such_process);
+
+ DataAggregator::PerfBranchSample &Sample = SampleRes.get();
+ EXPECT_EQ(Sample.LBR.size(), ExpectedSamples[NumSamples].size());
+
+ // Check the parsed LBREntries.
+ const auto *ActualIter = Sample.LBR.begin();
+ const auto *ExpectIter = ExpectedSamples[NumSamples].begin();
+ while (ActualIter != Sample.LBR.end() &&
+ ExpectIter != ExpectedSamples[NumSamples].end())
+ EXPECT_TRUE(checkLBREntry(*ActualIter++, *ExpectIter++));
+
+ ++NumSamples;
+ }
+ }
};
} // namespace bolt
@@ -113,6 +153,37 @@ TEST_F(PerfSpeEventsTestHelper, SpeBranches) {
EXPECT_TRUE(checkEvents(1234, 10, {"branches-spe:"}));
}
+TEST_F(PerfSpeEventsTestHelper, SpeBranchesWithBrstack) {
+ // Check perf input with SPE branch events as brstack format.
+ // Example collection command:
+ // ```
+ // perf record -e 'arm_spe_0/branch_filter=1/u' -- BINARY
+ // ```
+ // How Bolt extracts the branch events:
+ // ```
+ // perf script -F pid,brstack --itrace=bl
+ // ```
+
+ opts::ArmSPE = true;
+ opts::ReadPerfEvents = " 1234 0xa001/0xa002/PN/-/-/10/COND/-\n"
+ " 1234 0xb001/0xb002/P/-/-/4/RET/-\n"
+ " 1234 0xc001/0xc002/P/-/-/13/-/-\n"
+ " 1234 0xd001/0xd002/M/-/-/7/RET/-\n"
+ " 1234 0xe001/0xe002/P/-/-/14/RET/-\n"
+ " 1234 0xf001/0xf002/MN/-/-/8/COND/-\n";
+
+ LBREntry Entry1 = {0xa001, 0xa002, false};
+ LBREntry Entry2 = {0xb001, 0xb002, false};
+ LBREntry Entry3 = {0xc001, 0xc002, false};
+ LBREntry Entry4 = {0xd001, 0xd002, true};
+ LBREntry Entry5 = {0xe001, 0xe002, false};
+ LBREntry Entry6 = {0xf001, 0xf002, true};
+ std::vector<SmallVector<LBREntry, 2>> ExpectedSamples = {
+ {{Entry1}}, {{Entry2}}, {{Entry3}}, {{Entry4}}, {{Entry5}}, {{Entry6}},
+ };
+ parseAndCheckBrstackEvents(1234, ExpectedSamples);
+}
+
TEST_F(PerfSpeEventsTestHelper, SpeBranchesAndCycles) {
// Check perf input with SPE branch events and cycles.
// Example collection command:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your work Adam!
I commented on some changes and nits.
Also noting that for now this PR is stacked on top of #120741.
✅ With the latest revision this PR passed the C/C++ code formatter. |
BOLT gains the ability to process branch target information generated by Arm SPE data, using the `BasicAggregation` format. Example usage is: ```bash perf2bolt -p perf.data -o perf.boltdata --nl --spe BINARY ``` New branch data and compatibility: --- SPE branch entries in perf data contain a branch pair (`IP` -> `ADDR`) for the source and destination branches. DataAggregator processes those by creating two basic samples. Any other event types will have `ADDR` field set to `0x0`. For those a single sample will be created. Such events can be either SPE or non-SPE, like `l1d-access` and `cycles` respectively. The format of the input perf entries is: ``` PID EVENT-TYPE ADDR IP ``` When on SPE mode and: - host is not `AArch64`, BOLT will exit with a relevant message - `ADDR` field is unavailable, BOLT will exit with a relevant message - no branch pairs were recorded, BOLT will present a warning Examples of generating profiling data for the SPE mode: --- Profiles can be captured with perf on AArch64 machines with SPE enabled. They can be combined with other events, SPE or not. Capture only SPE branch data events: ```bash perf record -e 'arm_spe_0/branch_filter=1/u' -- BINARY ``` Capture any SPE events: ```bash perf record -e 'arm_spe_0//u' -- BINARY ``` Capture any SPE events and cycles ```bash perf record -e 'arm_spe_0//u' -e cycles:u -- BINARY ``` More filters, jitter, and specify count to control overheads/quality. ```bash perf record -e 'arm_spe_0/branch_filter=1,load_filter=0,store_filter=0,jitter=1/u' -c 10007 -- BINARY ```
Perf will be able to report SPE branch events as similar as it does with LBR brstack. Therefore we can utilize the existing LBR parsing process for SPE as well. Example of the SPE brstack input format: ```bash perf script -i perf.data -F pid,brstack --itrace=bl ``` ``` --- PID FROM TO PREDICTED --- 16984 0x72e342e5f4/0x72e36192d0/M/-/-/11/RET/- 16984 0x72e7b8b3b4/0x72e7b8b3b8/PN/-/-/11/COND/- 16984 0x72e7b92b48/0x72e7b92b4c/PN/-/-/8/COND/- 16984 0x72eacc6b7c/0x760cc94b00/P/-/-/9/RET/- 16984 0x72e3f210fc/0x72e3f21068/P/-/-/4//- 16984 0x72e39b8c5c/0x72e3627b24/P/-/-/4//- 16984 0x72e7b89d20/0x72e7b92bbc/P/-/-/4/RET/- ``` SPE brstack mispredicted flag might be two characters long: 'PN' or 'MN'. Where 'N' means the branch was marked as NOT-TAKEN. This event is only related to conditional instruction (conditional branch or compare-and-branch), it tells that failed its condition code check. Perf with 'brstack' support for SPE is available here: ``` https://github.com/Leo-Yan/linux/tree/perf_arm_spe_branch_flags_v2 ``` Example of useage with SPE perf data: ```bash perf2bolt -p perf.data -o perf.fdata --spe BINARY ``` Capture standard SPE branch events with perf: ```bash perf record -e 'arm_spe_0/branch_filter=1/u' -- BINARY ``` An unittest is also added to check parsing process of 'SPE brstack format'.
This commit aim is to uncouple the SPE BRStack and SPE BasicAggregation approaches based on the decision in issue llvm#115333. BRStack change relies on the unit test logic which was introduced by Paschalis Mpeis (ARM) in llvm#120741. Since it is a common part of the two aggregation type technique, needs to retain an essential part of it. All relevant tests to BasicAggregation is removed. Co-Authored-By: Paschalis Mpeis <Paschalis.Mpeis@arm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Adam,
Thanks for addressing the comments!
AArch64/perf2bolt-spe.test
seems to be failing. See comment below.
I've also added a few more nits and cleanups.
cl::desc("Enable Arm SPE mode. Can combine with `--nl` " | ||
"to use in no-lbr mode"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cl::desc("Enable Arm SPE mode. Can combine with `--nl` " | |
"to use in no-lbr mode"), | |
cl::desc("Enable Arm SPE mode."), |
launchPerfProcess("events without LBR", | ||
MainEventsPPI, | ||
if (opts::ArmSPE) { | ||
// pid from_ip to_ip predicted/missed not-taken? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can simplify to:
// pid from_ip to_ip predicted/missed not-taken? | |
// pid from_ip to_ip flags |
and then expand 'flags' below:
- P/N: whether branch was Predicted or Mispredicted
- N: optionally appears when the branch was Not-Taken (ie fall-through)
if (MispredStr.size() != 1 || | ||
(MispredStr[0] != 'P' && MispredStr[0] != 'M' && MispredStr[0] != '-')) { | ||
reportError("expected single char for mispred bit"); | ||
// SPE brstack mispredicted flags might be two characters long: 'PN' or 'MN'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// SPE brstack mispredicted flags might be two characters long: 'PN' or 'MN'. | |
// SPE brstack mispredicted flags might be up to two characters long: 'PN' or 'MN'. |
Can add that 'N' is optional.
TimerGroupDesc, opts::TimeAggregator); | ||
std::string BranchEventTypeStr = | ||
!opts::ArmSPE ? "branch events" : "SPE branch events in LBR-format"; | ||
outs() << "PERF2BOLT: " << BranchEventTypeStr << "...\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: adding back the verb from the original code:
outs() << "PERF2BOLT: " << BranchEventTypeStr << "...\n"; | |
outs() << "PERF2BOLT: parse " << BranchEventTypeStr << "...\n"; |
errs() | ||
<< "PERF2BOLT-WARNING: all recorded samples for this binary lack " | ||
"SPE brstack entries. Record profile with:" | ||
"perf record arm_spe_0/branch_filter=1/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"perf record arm_spe_0/branch_filter=1/"; | |
"perf record -e 'arm_spe_0/branch_filter=1/'"; |
## Check that Arm SPE mode is unavailable on X86. | ||
|
||
REQUIRES: system-linux,x86_64-linux | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test now is failing. Probably this line was accidentally removed?
RUN: %clang %cflags %p/../../Inputs/asm_foo.s %p/../../Inputs/asm_main.c -o %t.exe |
std::unique_ptr<ObjectFile> ObjFile; | ||
std::unique_ptr<BinaryContext> BC; | ||
|
||
/// Compare LBREntries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could add that: @return true if LBREntries are equal.
Also, could end comments with a dot.
while (ActualIter != Sample.LBR.end() && | ||
ExpectIter != ExpectedSamples[NumSamples].end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/tip: may be able to use zip_equal with structured binding here.
"mode may be limited)\n"; | ||
if (!opts::ArmSPE) | ||
errs() | ||
<< "PERF2BOLT-WARNING: all recorded samples for this binary lack " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a user is on an older Linux/perf version, they may get zero samples.
Could you amend this message to note that too? Please also indicate which Linux version is required for support [1].
In those cases running perf script -F pid,brstack --itrace=bl
only returns:
PID
1234
1234
1234
In other words, the brstack field is missing, in a silent failure.
[1] @Leo-Yan confirms that his brstack work landed on Linux Kernel v6.14. Thanks Leo!
Perf will be able to report SPE branch events as similar as it does with LBR brstack.
Therefore we can utilize the existing LBR parsing process for SPE as well.
Example of the SPE brstack input format:
SPE brstack mispredicted flag might be two characters long:
PN
orMN
. WhereN
means the branch was marked as NOT-TAKEN. This event is only related to conditional instruction (conditional branch or compare-and-branch), it tells that failed its condition code check.Perf with 'brstack' support for SPE is available here:
Example of useage with SPE perf data:
Capture standard SPE branch events with perf:
perf record -e 'arm_spe_0/branch_filter=1/u' -- BINARY
An unittest is also added to check parsing process of 'SPE brstack format'.