Changes to StatementKind::Coverage

# Proposal

I've been trying to make improvements to coverage instrumentation, but the roadblock I keep running into is that the current MIR representation of coverage information is awkward to work with. I would therefore like to propose a set of loosely-related changes that should make coverage improvements easier.

## Important properties that won't change
- There will still be only one `StatementKind` for all the different kinds of coverage information
- It will still use a boxed payload to avoid bloating other statements
- Backends/interpreters that don't care about coverage can still treat `StatementKind::Coverage` as a no-op
- There should be no user-visible changes to coverage output

## Summary of proposed changes
- Change `ExpressionOperandId` to a simple enum
  - https://github.com/rust-lang/rust/pull/113428
- Within `InstrumentCoverage`, store transient coverage info in separate side-tables instead of mutating the BCB graph
  - https://github.com/rust-lang/rust/pull/114354
- Decouple `InstrumentCoverage`'s internal data structures from the shared types that get injected into MIR
  - https://github.com/rust-lang/rust/pull/114791
- Move almost all coverage-related code out of `rustc_codegen_ssa` and into `rustc_codegen_llvm`
  - https://github.com/rust-lang/rust/pull/113355
- Change `Coverage`/`CoverageKind` to completely separate counter-increment ops from mapping information
  - Incorporated into https://github.com/rust-lang/rust/pull/116046

*That's a lot to digest for anyone who isn't already familiar with the coverage implementation, so I'll explain each change in more detail.*

## Change `ExpressionOperandId` to a simple enum

LLVM coverage has a concept of “mapping expressions” that allow a piece of code's execution count to be computed as a simple arithmetic expression over other counters/expressions, instead of requiring a dedicated physical counter for every control-flow branch.

These expressions have an operator (`+` or `-`) and two operands. Operands are currently represented as `ExpressionOperandId`, which wraps a `u32` with the following semantics:

- 0 represents a hard-coded zero counter
- Values ascending from 1 represent counter IDs
- Values descending from `u32::MAX` represent the IDs of other expressions

There's a lot of ad-hoc code spread throughout the MIR pass and the LLVM backend for manipulating these values (often as raw `u32`s). Also, it's theoretically *impossible* to know whether an operand is a counter or an expression without first counting the total number of counters in its corresponding function.

It would be much simpler to represent operands as a simple `enum`, with separate cases for `Counter(CounterID)`, `Expression(ExpressionId)`, and `Zero`.

## Within `InstrumentCoverage`, store transient coverage info in separate side-tables instead of mutating the BCB graph

When deciding how to instrument the underlying MIR for coverage, the `InstrumentCoverage` pass builds a simplified “Basic Counter Block” graph, and then allocates coverage counters/expressions to various nodes/edges in the BCB graph as necessary. Those counters/expressions are then injected into the function's MIR.

The awkward thing here is that the code for doing this needs `&mut` access to the graph, in order to associate coverage info with individual nodes, even though it isn't making any structural changes to the graph itself. That makes it harder to understand and modify the instrumentation code.

In addition, the graph alone can't hold all the information that is needed. There ends up being an extra vector of “intermediate expressions” that needs to be passed around separately anyway.

It would be simpler to store this coverage information outside the graph itself, in separate side-tables indexed by BCB node ID as necessary. It could then be easily traversed by the code that actually injects coverage information into MIR.

## Decouple `InstrumentCoverage`'s internal data structures from the shared types that get injected into MIR

`enum CoverageKind` is an important part of `StatementKind::Coverage`, but the `InstrumentCoverage` pass also uses it heavily as an internal data structure. This means that any change to `CoverageKind` also needs to update all of the internal parts of `InstrumentCoverage` that manipulate it directly, making the MIR representation difficult to modify.

It would be better to have a separate data structure that is used internally within the instrumentation pass, and that is only converted into its corresponding MIR structure just as it is being injected into MIR at the very end.

## Move almost all coverage-related code out of `rustc_codegen_ssa` and into `rustc_codegen_llvm`

Currently the backend code for coverage instrumentation is split between the SSA and LLVM crates. In practice this is a bit of a lie, because LLVM is the only backend that actually supports coverage, and LLVM-specific details are spread heavily throughout all parts of the coverage-related code. So the main consequence of this split is that navigating and changing coverage code becomes more difficult.

I would like to move almost all of the coverage-related code out of SSA and into LLVM, leaving behind a single per-backend method that takes a `&Coverage` reference and can easily be implemented as a stub or no-op for non-LLVM backends.

(Of course, this change can be revisited if any other SSA backend does eventually want to start supporting coverage.)

## Change `Coverage`/`CoverageKind` to completely separate counter-increment ops from mapping information

One of the motivations behind the previously-mentioned changes is that they make it easier to change the MIR representation of coverage information.

An awkward thing about the current representation is that it mixes physical counter-increment instructions, counter-expression data, and mappings between source-code regions and counter/expression values. This results in some complexity in the instrumentation pass, and also complicates other passes that may need to move or alter coverage statements as part of their transformations.

I would like to introduce a new variant in `CoverageKind` that explicitly associates a counter/expression/zero with a particular region of source code, and remove the ability for the other variants to also contain mapping information. This will create a cleaner distinction between counter-increment statements (which directly affect codegen), and mapping information (which simply needs to make its way to a later step that collects and encodes all the available mappings in the final compiler output).

# Mentors or Reviewers

???

(I've already prototyped most of these changes locally, so I'm comfortable doing most of the implementation work without explicit mentoring, but advice is still welcome.)

# Process

The main points of the [Major Change Process][MCP] are as follows:

* [x] File an issue describing the proposal.
* [ ] A compiler team member or contributor who is knowledgeable in the area can **second** by writing `@rustbot second`.
    * Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a `-C flag`, then full team check-off is required.
    * Compiler team members can initiate a check-off via `@rfcbot fcp merge` on either the MCP or the PR.
* [ ] Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered **approved**.

You can read [more about Major Change Proposals on forge][MCP].

[MCP]: https://forge.rust-lang.org/compiler/mcp.html

# Comments

**This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes to StatementKind::Coverage #645

Proposal

Important properties that won't change

Summary of proposed changes

Change `ExpressionOperandId` to a simple enum

Within `InstrumentCoverage`, store transient coverage info in separate side-tables instead of mutating the BCB graph

Decouple `InstrumentCoverage`'s internal data structures from the shared types that get injected into MIR

Move almost all coverage-related code out of `rustc_codegen_ssa` and into `rustc_codegen_llvm`

Change `Coverage`/`CoverageKind` to completely separate counter-increment ops from mapping information

Mentors or Reviewers

Process

Comments

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Changes to StatementKind::Coverage #645

Description

Proposal

Important properties that won't change

Summary of proposed changes

Change ExpressionOperandId to a simple enum

Within InstrumentCoverage, store transient coverage info in separate side-tables instead of mutating the BCB graph

Decouple InstrumentCoverage's internal data structures from the shared types that get injected into MIR

Move almost all coverage-related code out of rustc_codegen_ssa and into rustc_codegen_llvm

Change Coverage/CoverageKind to completely separate counter-increment ops from mapping information

Mentors or Reviewers

Process

Comments

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Change `ExpressionOperandId` to a simple enum

Within `InstrumentCoverage`, store transient coverage info in separate side-tables instead of mutating the BCB graph

Decouple `InstrumentCoverage`'s internal data structures from the shared types that get injected into MIR

Move almost all coverage-related code out of `rustc_codegen_ssa` and into `rustc_codegen_llvm`

Change `Coverage`/`CoverageKind` to completely separate counter-increment ops from mapping information