Description
Proposal
I've been trying to make improvements to coverage instrumentation, but the roadblock I keep running into is that the current MIR representation of coverage information is awkward to work with. I would therefore like to propose a set of loosely-related changes that should make coverage improvements easier.
Important properties that won't change
- There will still be only one
StatementKind
for all the different kinds of coverage information - It will still use a boxed payload to avoid bloating other statements
- Backends/interpreters that don't care about coverage can still treat
StatementKind::Coverage
as a no-op - There should be no user-visible changes to coverage output
Summary of proposed changes
- Change
ExpressionOperandId
to a simple enum - Within
InstrumentCoverage
, store transient coverage info in separate side-tables instead of mutating the BCB graph - Decouple
InstrumentCoverage
's internal data structures from the shared types that get injected into MIR - Move almost all coverage-related code out of
rustc_codegen_ssa
and intorustc_codegen_llvm
- Change
Coverage
/CoverageKind
to completely separate counter-increment ops from mapping information
That's a lot to digest for anyone who isn't already familiar with the coverage implementation, so I'll explain each change in more detail.
Change ExpressionOperandId
to a simple enum
LLVM coverage has a concept of “mapping expressions” that allow a piece of code's execution count to be computed as a simple arithmetic expression over other counters/expressions, instead of requiring a dedicated physical counter for every control-flow branch.
These expressions have an operator (+
or -
) and two operands. Operands are currently represented as ExpressionOperandId
, which wraps a u32
with the following semantics:
- 0 represents a hard-coded zero counter
- Values ascending from 1 represent counter IDs
- Values descending from
u32::MAX
represent the IDs of other expressions
There's a lot of ad-hoc code spread throughout the MIR pass and the LLVM backend for manipulating these values (often as raw u32
s). Also, it's theoretically impossible to know whether an operand is a counter or an expression without first counting the total number of counters in its corresponding function.
It would be much simpler to represent operands as a simple enum
, with separate cases for Counter(CounterID)
, Expression(ExpressionId)
, and Zero
.
Within InstrumentCoverage
, store transient coverage info in separate side-tables instead of mutating the BCB graph
When deciding how to instrument the underlying MIR for coverage, the InstrumentCoverage
pass builds a simplified “Basic Counter Block” graph, and then allocates coverage counters/expressions to various nodes/edges in the BCB graph as necessary. Those counters/expressions are then injected into the function's MIR.
The awkward thing here is that the code for doing this needs &mut
access to the graph, in order to associate coverage info with individual nodes, even though it isn't making any structural changes to the graph itself. That makes it harder to understand and modify the instrumentation code.
In addition, the graph alone can't hold all the information that is needed. There ends up being an extra vector of “intermediate expressions” that needs to be passed around separately anyway.
It would be simpler to store this coverage information outside the graph itself, in separate side-tables indexed by BCB node ID as necessary. It could then be easily traversed by the code that actually injects coverage information into MIR.
Decouple InstrumentCoverage
's internal data structures from the shared types that get injected into MIR
enum CoverageKind
is an important part of StatementKind::Coverage
, but the InstrumentCoverage
pass also uses it heavily as an internal data structure. This means that any change to CoverageKind
also needs to update all of the internal parts of InstrumentCoverage
that manipulate it directly, making the MIR representation difficult to modify.
It would be better to have a separate data structure that is used internally within the instrumentation pass, and that is only converted into its corresponding MIR structure just as it is being injected into MIR at the very end.
Move almost all coverage-related code out of rustc_codegen_ssa
and into rustc_codegen_llvm
Currently the backend code for coverage instrumentation is split between the SSA and LLVM crates. In practice this is a bit of a lie, because LLVM is the only backend that actually supports coverage, and LLVM-specific details are spread heavily throughout all parts of the coverage-related code. So the main consequence of this split is that navigating and changing coverage code becomes more difficult.
I would like to move almost all of the coverage-related code out of SSA and into LLVM, leaving behind a single per-backend method that takes a &Coverage
reference and can easily be implemented as a stub or no-op for non-LLVM backends.
(Of course, this change can be revisited if any other SSA backend does eventually want to start supporting coverage.)
Change Coverage
/CoverageKind
to completely separate counter-increment ops from mapping information
One of the motivations behind the previously-mentioned changes is that they make it easier to change the MIR representation of coverage information.
An awkward thing about the current representation is that it mixes physical counter-increment instructions, counter-expression data, and mappings between source-code regions and counter/expression values. This results in some complexity in the instrumentation pass, and also complicates other passes that may need to move or alter coverage statements as part of their transformations.
I would like to introduce a new variant in CoverageKind
that explicitly associates a counter/expression/zero with a particular region of source code, and remove the ability for the other variants to also contain mapping information. This will create a cleaner distinction between counter-increment statements (which directly affect codegen), and mapping information (which simply needs to make its way to a later step that collects and encodes all the available mappings in the final compiler output).
Mentors or Reviewers
???
(I've already prototyped most of these changes locally, so I'm comfortable doing most of the implementation work without explicit mentoring, but advice is still welcome.)
Process
The main points of the Major Change Process are as follows:
- File an issue describing the proposal.
- A compiler team member or contributor who is knowledgeable in the area can second by writing
@rustbot second
.- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a
-C flag
, then full team check-off is required. - Compiler team members can initiate a check-off via
@rfcbot fcp merge
on either the MCP or the PR.
- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a
- Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.
You can read more about Major Change Proposals on forge.
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.