Skip to content

Changes to StatementKind::Coverage #645

Closed
@Zalathar

Description

@Zalathar

Proposal

I've been trying to make improvements to coverage instrumentation, but the roadblock I keep running into is that the current MIR representation of coverage information is awkward to work with. I would therefore like to propose a set of loosely-related changes that should make coverage improvements easier.

Important properties that won't change

  • There will still be only one StatementKind for all the different kinds of coverage information
  • It will still use a boxed payload to avoid bloating other statements
  • Backends/interpreters that don't care about coverage can still treat StatementKind::Coverage as a no-op
  • There should be no user-visible changes to coverage output

Summary of proposed changes

That's a lot to digest for anyone who isn't already familiar with the coverage implementation, so I'll explain each change in more detail.

Change ExpressionOperandId to a simple enum

LLVM coverage has a concept of “mapping expressions” that allow a piece of code's execution count to be computed as a simple arithmetic expression over other counters/expressions, instead of requiring a dedicated physical counter for every control-flow branch.

These expressions have an operator (+ or -) and two operands. Operands are currently represented as ExpressionOperandId, which wraps a u32 with the following semantics:

  • 0 represents a hard-coded zero counter
  • Values ascending from 1 represent counter IDs
  • Values descending from u32::MAX represent the IDs of other expressions

There's a lot of ad-hoc code spread throughout the MIR pass and the LLVM backend for manipulating these values (often as raw u32s). Also, it's theoretically impossible to know whether an operand is a counter or an expression without first counting the total number of counters in its corresponding function.

It would be much simpler to represent operands as a simple enum, with separate cases for Counter(CounterID), Expression(ExpressionId), and Zero.

Within InstrumentCoverage, store transient coverage info in separate side-tables instead of mutating the BCB graph

When deciding how to instrument the underlying MIR for coverage, the InstrumentCoverage pass builds a simplified “Basic Counter Block” graph, and then allocates coverage counters/expressions to various nodes/edges in the BCB graph as necessary. Those counters/expressions are then injected into the function's MIR.

The awkward thing here is that the code for doing this needs &mut access to the graph, in order to associate coverage info with individual nodes, even though it isn't making any structural changes to the graph itself. That makes it harder to understand and modify the instrumentation code.

In addition, the graph alone can't hold all the information that is needed. There ends up being an extra vector of “intermediate expressions” that needs to be passed around separately anyway.

It would be simpler to store this coverage information outside the graph itself, in separate side-tables indexed by BCB node ID as necessary. It could then be easily traversed by the code that actually injects coverage information into MIR.

Decouple InstrumentCoverage's internal data structures from the shared types that get injected into MIR

enum CoverageKind is an important part of StatementKind::Coverage, but the InstrumentCoverage pass also uses it heavily as an internal data structure. This means that any change to CoverageKind also needs to update all of the internal parts of InstrumentCoverage that manipulate it directly, making the MIR representation difficult to modify.

It would be better to have a separate data structure that is used internally within the instrumentation pass, and that is only converted into its corresponding MIR structure just as it is being injected into MIR at the very end.

Move almost all coverage-related code out of rustc_codegen_ssa and into rustc_codegen_llvm

Currently the backend code for coverage instrumentation is split between the SSA and LLVM crates. In practice this is a bit of a lie, because LLVM is the only backend that actually supports coverage, and LLVM-specific details are spread heavily throughout all parts of the coverage-related code. So the main consequence of this split is that navigating and changing coverage code becomes more difficult.

I would like to move almost all of the coverage-related code out of SSA and into LLVM, leaving behind a single per-backend method that takes a &Coverage reference and can easily be implemented as a stub or no-op for non-LLVM backends.

(Of course, this change can be revisited if any other SSA backend does eventually want to start supporting coverage.)

Change Coverage/CoverageKind to completely separate counter-increment ops from mapping information

One of the motivations behind the previously-mentioned changes is that they make it easier to change the MIR representation of coverage information.

An awkward thing about the current representation is that it mixes physical counter-increment instructions, counter-expression data, and mappings between source-code regions and counter/expression values. This results in some complexity in the instrumentation pass, and also complicates other passes that may need to move or alter coverage statements as part of their transformations.

I would like to introduce a new variant in CoverageKind that explicitly associates a counter/expression/zero with a particular region of source code, and remove the ability for the other variants to also contain mapping information. This will create a cleaner distinction between counter-increment statements (which directly affect codegen), and mapping information (which simply needs to make its way to a later step that collects and encodes all the available mappings in the final compiler output).

Mentors or Reviewers

???

(I've already prototyped most of these changes locally, so I'm comfortable doing most of the implementation work without explicit mentoring, but advice is still welcome.)

Process

The main points of the Major Change Process are as follows:

  • File an issue describing the proposal.
  • A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
    • Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
    • Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
  • Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-compilerAdd this label so rfcbot knows to poll the compiler teammajor-changeA proposal to make a major change to rustcmajor-change-acceptedA major change proposal that was accepted

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions