Description
What is this issue?
This is a major change proposal, which means a proposal to make a notable change to the compiler -- one that either alters the architecture of some component, affects a lot of people, or makes a small but noticeable public change (e.g., adding a compiler flag). You can read more about the MCP process on https://forge.rust-lang.org/.
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.
MCP Checklist
- Fill out and file this issue. The @rust-lang/wg-prioritization group will add this to the triage meeting agenda so folks see it.
- Create a Zulip topic in the stream
#t-compiler/major changes
with the nameXXX compiler-team#NNN
, whereXXX
is the title of this issue andNNN
is whatever issue number gets assigned. Discuss the MCP in there, and feel free to update the proposal as needed. - Find a reviewer, and add their name to this comment (see the section below).
- Find a second, someone who is knowledgeable of the area and approves of the design, and have them leave a comment on the issue.
- Announce this proposal at the triage meeting (ping @rust-lang/wg-prioritization to have them add it to the agenda). All edits should be done before you do this.
- After one week, assuming no unresolved concerns, the MCP is accepted! (We sometimes skip this week period if it seems unnecessary.) Add the
mcp-accepted
label and close the issue; we can link to it for future reference.
TL;DR
Implement LLVM-compatible source-based code coverage for Rust.
Links and Details
Core Requirements
-
Instrument Rust crates by injecting additional runtime code to:
- Count each executed branch of code ("coverage region"); for example, function blocks, loop iterations,
if
andelse
blocks, andmatch
arms. - Report the counter totals (typically at program exit) in LLVM "raw profile format", compatible with the
llvm-profdata
tool.- NOTE: The raw profile format is not standardized, and can vary depending on the LLVM compiler version. Therefore, the Rust implementation must use LLVM Intrinsics to count and report coverage. The
llvm-profdata
tool is used to interpret the LLVM raw profile data from a given LLVM version.
- NOTE: The raw profile format is not standardized, and can vary depending on the LLVM compiler version. Therefore, the Rust implementation must use LLVM Intrinsics to count and report coverage. The
- Count each executed branch of code ("coverage region"); for example, function blocks, loop iterations,
-
Generate a coverage map in LLVM Code Coverage Mapping Format that uniquely identifies counted coverage regions (source code spans) corresponding to the injected runtime counters.
With these requirements satisfied, LLVM coverage analysis tools, and some LLVM-supported GCC coverage analysis tools (as supported by existing compatibility features in LLVM tooling), should support coverage analysis of Rust program source code.
IMPORTANT: The LLVM coverage tooling can potentially support Profile Guided Optimization (PGO). Rust already has an option for compiling with PGO (rustc -Cprofile-generate=/tmp/pgo-data
). This MCP is focused on source-based code coverage. Every effort will be made to accommodate future support for additional PGO extensions, potentially enabled by coverage instrumentation, but this MCP emphasizes a design optimized for source-based code coverage only.
Approach and Notional Design
The following sections describe a high level design, and a stepwise approach to assess and implement LLVM code coverage for Rust. (Some of these steps are already in progress, including prototype implementation code.) This plan is open to change as a result of improved understanding of the rustc and LLVM architectures, and insights from reviewers.
Identify Rust Code Patterns for Code Regions and Counters
One of the primary use cases for source-based code coverage is to visually highlight the code regions executed by a program, separated by branch decision points. Regardless of how the code is instrumented, the final result must be verified against the source code, for all branch types supported by the Rust syntax, to ensure the coverage regions (start and end character positions in a source file) are consistent with the programmer's interpretation of the Rust code structure.
An analysis of the Rust Abstract Syntax Tree (AST) and its AST node types will help establish a baseline for validating the instrumentation. (This does not imply the instrumentation must be done in the AST.)
- Review all AST node types relevant to Coverage analysis (such as
Item::Fn
,Expr
,Stmt
, andBlock
; defined in src/librustc_ast/ast.rs) and create sample Rust programs with contrived examples of Rust language patterns that create conditional branching. Here is a snippet from one such analysis. The comments on the left roughly sketch a graphical representation of the branching. The colors were added only to this snippet, to further illustrate the separate coverage regions.
- Confirm coverage regions and instrumentation points using an experimental counter function (such as
__incr_cov()
below) to validate an approach for each syntax test case. Note that counts for some coverage regions can be computed, using LLVM's Counter Expressions.
/// Experimental only. Actual generated code will inject the `llvm.instrprof.increment()`
/// intrinsic directly, without a separate wrapper function.
pub fn __incr_cov<T>(/*counter args,...*/ result: T) -> T {
__internal_llvm_intrinsic_increment_placeholder(/*counter args,...*/);
result
}
Identify Coverage Regions, and Inject Placeholder Counters
-
Inject calls to increment coverage counters by injecting a call to
llvm-instrprof-increment
. There are differing opinions as to where coverage regions should be identified, and where to inject the counters (as discussed in the original Issue, the initial Pull Request, and the Zulip thread associated with this MCP).- An experienced compiler team member has recommended performing coverage region identification and instrumentation directly on the MIR, in a "MIR->MIR" pass. Some benefits of this approach include: (a) coverage support should not be affected by changes to the Rust language syntax; (b) instrumentation might be simpler if coverage regions can be accurately identified based on MIR data alone, which has far fewer variants compared to AST constructs; and (c) compiler performance should be faster than other approaches, since the coverage code is injected at or near the final compiler pass. Given the strong recommendation, and the benefits, the MIR->MIR approach is the current plan, but must still be proven.
- If the MIR->MIR approach is not viable, fallback options include: (a) implementing a
rustc_ast::mut_visit::MutVisitor
after AST expansion (beforeresolver.resolve_crate()
) to walk the AST and inject the counter statements (this has already been implemented in a functional, but limited prototype), (b) injecting additional instrumentation nodes while lowering the AST to HIR (some prototyping has been done here, and the existing pattern withexpr_drop_temps()
for injecting a placeholder, to be converted to generated IR code at a latter stage, is worth considering in this case); (c) or injecting coverage during "HIR->MIR" conversion (discussed but not yet attempted).
-
Save a map from each injected counter to the source code start and end character representing the coverage region (represented by the existing
rustc
type,Span
). -
Implement an experimental
rustc
command line option to enable code coverage by crate.
Add llvm-instrprof-increment
Support to Existing Rust Compiler Runtime
-
Implement the changes required to introduce the
llvm-instrprof-increment
call for coverage counters. (It may be possible to inject the intrinsic in the MIR without explicitly defining as anextern
function.) If required:- Add
pub fn instrprof_increment
to theextern “rust-intrinsic”
section in libcore. - Add support for
instrprof_increment
to codegen_intrinsic_call() in librustc_codegen_llvm/intrinsic.rs.
- Add
-
Update any other required build configuration dependencies, flags, documentation, and tests, as demonstrated by similar examples.
Update llvm-instrprof-increment
temporary arguments
-
Locate the best stage for replacing all temporary arguments to llvm-instrprof-increment with the corrected values.
-
Identify and/or insert each instrumented function's mangled function name by global/static pointer. Use the
librustc_symbol_mangling
library to compute the mangled function name, as needed). This process must take place after monomorphization of generic types, so the mangled function name can be generated for each monomorphized version of the function. (Note that, by taking this proposed approach, coverage will be counted separately for each reified permutation of type parameters, even though the end result may be the sum of these counts. This seems to align well with the existing “v0” implementation of function name mangling, and instrumenting each type variant separately may have other benefits for source code coverage analysis and profiling.) -
Update the
num_counters
argument with the total number of injected counters per function. -
Generate the
hash
argument (from the MIR, HIR, or AST, to be determined) that identifies whether a function has changed enough to invalidate a past coverage analysis. Also called the "function's structural hash", this is a hash value that can be used by the consumer of the profile data to detect changes to the instrumented source. (Comments and whitespace should be excluded of course, but other criteria can be included or excluded. Clang, for instance, bases its hash on the overall control flow.)
Generate the Coverage Map
-
Review the LLVM Code Coverage Mapping Format documentation and example implementations, such as from Clang](https://clang.llvm.org/doxygen/CoverageMappingGen_8cpp_source.html) and Swift.
-
Using the mapping from injected counters to coverage spans, and the additional details for each counter (mangled function name, function hash, num_counters), embed the coverage mapping into the LLVM IR.
-
Unless there are major objections, leverage the LLVM CoverageMappingWriter (C++) to generate and emit the coverage map.
Tests and Documentation
-
Implement Unit and End-to-End Integration Tests, following existing examples.
- Tests will validate the emitted LLVM IR and coverage map.
- Validate coverage assumptions, for example, to confirm that no code region can be counted more than once for the same execution.
- Add benchmarks to rustc-perf to identify and address any hot spots.
-
Update user and developer documentation as needed.
Optimization
Explore opportunities for optimizing the initial implementation of code coverage.
GitHub Artifacts
Relevant issue: #34701 - Implement support for LLVMs code coverage instrumentation
Experimental prototype PR and discussion: #70680 - WIP toward LLVM Code Coverage for Rust
LLVM Source-based Code Coverage
- User Tutorial (C++ example)
- LLVM Coverage Mapping Format
llvm.instrprof.increment
intrinsic - Increments code counters at runtimellvm-profdata
command - Profile data toolllvm-cov
command - Emit coverage information
Mentors or Reviewers
- Recommendations and reviews from any Rust compiler team member would be greatly appreciated. Tyler Mandry (tmandry) is a current compiler contributor, and a coworker of mine, and has offered to review and advise on this project.
- Bob Wilson (bob-wilson) has also offered to review and advise (and possibly lend a hand in the implementation). Bob brings experience as a developer of the LLVM source-based code coverage implementation for Clang.
- Additional LLVM and Rust expertise from within the Google Fuchsia team is also available for reviews and guidance.