Skip to content

[AArch64] BOLT does not support SPE branch data #115333

@paschalis-mpeis

Description

@paschalis-mpeis

(1) Purpose

In Linux 6.13 kernel, perf gained the ability to process additional branch entries of
Arm's Statistical Profiling Extension (SPE). ATM, those changes are part of perf-tools-next.

The purpose of this issue is to raise awareness to the community that BOLT will need some changes to be able to take advantage of this additional information. Below, I briefly explain what this extra information is, how to check one is on the right version, and some pointers of how it may be used in the future.

(2) What is the extra branch information

With Linux 6.13 kernel, perf gained the ability to process all sampled branches and not just the branch misses. Each branch sample will be pointing to the correct target (ip → to_ip).

In short, with SPE one can get a statistical sample of the branches, where each event will have a Source and a Destination (fields ip → addr).

(3) Check the correct perf version is used

This is now part of Linux 6.13 [kernel diff].
Currently, one can use the updated perf by compiling perf-tools-next. The relevant patches are merged, which are: #patch1, #patch2, #patch3, #patch4.

A couple of checks need to be done:

  1. Verify if the machine has SPE available. If not, enable SPE (see guide).
  2. Confirm perf is on the correct version (>= 6.13) and can process all branches. It should report 'branch' instead of 'branch-miss' events (this can also be checked with perf --version).

Both checks can be performed using:

perf record -e arm_spe/branch_filter=1/u 2>/dev/null -- ls . > /dev/null \
	&& (perf script -F event,ip,addr 2> /dev/null | grep -v branch-miss -q \
	&& echo "SUCCESS: perf can process all SPE branch samples" \
	|| echo "FAILURE: perf cannot process all SPE branch samples") \
	|| echo "FAILURE: SPE is not available"

(4) How the additional information may be used

There can be two approaches of supporting SPE in BOLT. Some relevant points of how this information may be aligned to existing BOLT infrastructure are discussed briefly below.

(4a) Using the BasicAggregation format:

  • [BOLT][AArch64] Introduce SPE mode in BasicAggregation #120741
    For this format, one could parse the source and destination branches and create two simple events. Those can then be processed by BOLT as normal, under the -nl flag. They will use an accurate sample of branch locations with hotness counts for the source and the destination blocks.

(4b) Using the LBR format:

  • Add initial support for SPE brstack format #129231
    For this format, one could convert the source and destination branches into the LBR format. The trace will have a pretty shallow depth of 2, being: { 'destination', 'source' }, matching the expected branch history.
    The flags field in perf script could be toggled on to distinguish between branch misses and hits. There seems to be no point in using the branch fall-through information, due to how SPE operates: in contrast with LBR, SPE is a non-contiguous statistical sampling. In other words, ignoring fall-throughs would be dropping information that BOLT cannot infer.

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions