Skip to content

tech review of BOLT Learning Path in progress #2039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,29 +1,28 @@
---
title: Optimizing Arm binaries and libraries with LLVM-BOLT and profile merging
title: Optimize Arm applications and shared libraries with BOLT

draft: true
cascade:
draft: true

minutes_to_complete: 30

who_is_this_for: Performance engineers, software developers working on Arm platforms who want to optimize both application binaries and shared libraries using LLVM-BOLT.
who_is_this_for: Performance engineers and software developers working on Arm platforms who want to optimize both application binaries and shared libraries using BOLT.

learning_objectives:
- Instrument and optimize binaries for individual workload features using LLVM-BOLT.
- Instrument and optimize application binaries for individual workload features using BOLT.
- Collect separate BOLT profiles and merge them for comprehensive code coverage.
- Optimize shared libraries independently.
- Integrate optimized shared libraries into applications.
- Evaluate and compare application and library performance across baseline, isolated, and merged optimization scenarios.

prerequisites:
- An Arm based system running Linux with BOLT and Linux Perf installed. The Linux kernel should be version 5.15 or later.
- (Optional) A second, more powerful Linux system to build the software executable and run BOLT.
- An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed.

author: Gayathri Narayana Yegna Narayanan

### Tags
skilllevels: Introductory
skilllevels: Advanced
subjects: Performance and Architecture
armips:
- Neoverse
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Overview of BOLT Merge
title: BOLT overview
weight: 2

### FIXED, DO NOT MODIFY
Expand All @@ -8,20 +8,49 @@ layout: learningpathall

[BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md) is a post-link binary optimizer that uses Linux Perf data to re-order the executable code layout to reduce memory overhead and improve performance.

In this Learning Path, you'll learn how to:
- Collect and merge BOLT profiles from multiple workload features (e.g., read-only and write-only)
- Independently optimize application binaries and external user-space libraries (e.g., `libssl.so`, `libcrypto.so`)
- Link the final optimized binary with the separately bolted libraries to deploy a fully optimized runtime stack

While MySQL and sysbench are used as examples, this method applies to **any feature-rich application** that:
- Exhibits multiple runtime paths
- Uses dynamic libraries
- Requires full-stack binary optimization for performance-critical deployment

The workflow includes:
1. Profiling each workload feature separately
2. Profiling external libraries independently
3. Merging profiles for broader code coverage
4. Applying BOLT to each binary and library
5. Linking bolted libraries with the merged-profile binary
Make sure you have [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed.

You should use an Arm Linux system with at least 4 CPUs and 16 Gb of RAM. Ubuntu 24.04 is used for testing, but other Linux distributions are possible.

## What will I do in this Learning Path?

In this Learning Path you learn how to use BOLT to optimize applications and shared libraries. MySQL is used as the applcation and two share libraries which are used by MySQL are also optimized using BOLT.

1. Collect and merge BOLT profiles from multiple workloads, such as read-only and write-only

A read-only workload typically involves operations that only retrieve or query data, such as running SELECT statements in a database without modifying any records. In contrast, a write-only workload focuses on operations that modify data, such as INSERT, UPDATE, or DELETE statements. Profiling both types ensures that the optimized binary performs well under different usage patterns.

2. Independently optimize application binaries and external user-space libraries, such as `libssl.so` and `libcrypto.so`

This means you can apply BOLT optimizations not just to your main application, but also to shared libraries it depends on, resulting in a more comprehensive performance improvement across your entire stack.

3. Merge profile data for broader code coverage

By combining the profile data collected from different workloads and libraries, you create a single, comprehensive profile that represents a wide range of application behaviors. This merged profile allows BOLT to optimize code paths that are exercised under different scenarios, leading to better overall performance and coverage than optimizing for a single workload.

4. Run BOLT on each binary application and library

With the merged profile, you apply BOLT optimizations separately to each binary and shared library. This step ensures that both your main application and its dependencies are optimized based on real-world usage patterns, resulting in a more efficient and responsive software stack.

5. Link the final optimized binary with the separately bolted libraries to deploy a fully optimized runtime stack

After optimizing each component, you combine them to create a deployment where both the application and its libraries benefit from BOLT's enhancements.


## What are good applications for BOLT?

MySQL and sysbench are used as example applications, but you can use this method for **any feature-rich application** that:

1. Exhibits multiple runtime paths

Applications often have different code paths depending on the workload or user actions. Optimizing for just one path can leave performance gains untapped in others. By profiling and merging data from various workloads, you ensure broader optimization coverage.

2. Uses dynamic libraries

Many modern applications rely on shared libraries for functionality. Optimizing these libraries alongside the main binary ensures consistent performance improvements throughout the application.

3. Requires full-stack binary optimization for performance-critical deployment

In scenarios where every bit of performance matters, such as high-throughput servers or latency-sensitive applications, optimizing the entire binary stack can yield significant benefits.


Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ weight: 3
layout: learningpathall
---

In this step, you will instrument an application binary (such as `mysqld`) with BOLT to collect runtime profile data for a specific feature — for example, a **read-only workload**.
In this step, you will use BOLT to instrument the MySQL application binary and to collect profile data for specific workloads.

The collected profile will later be merged with others and used to optimize the application's code layout.
The collected profiles will be merged with others and used to optimize the application's code layout.

### Step 1: Build or obtain the uninstrumented binary
### Build the uninstrumented binary

Make sure your application binary is:

Expand All @@ -26,8 +26,6 @@ readelf -s /path/to/mysqld | grep main

If the symbols are missing, rebuild the binary with debug info and no stripping.

---

### Step 2: Instrument the binary with BOLT

Use `llvm-bolt` to create an instrumented version of the binary:
Expand Down Expand Up @@ -84,6 +82,3 @@ ls -lh /path/to/profile-readonly.fdata

You should see a non-empty file. This file will later be merged with other profiles (e.g., for write-only traffic) to generate a complete merged profile.

---