| 
 | 1 | +# Dataflow Analysis  | 
 | 2 | + | 
 | 3 | +If you work on the MIR, you will frequently come across various flavors of  | 
 | 4 | +[dataflow analysis][wiki]. For example, `rustc` uses dataflow to find  | 
 | 5 | +uninitialized variables, determine what variables are live across a generator  | 
 | 6 | +`yield` statement, and compute which `Place`s are borrowed at a given point in  | 
 | 7 | +the control-flow graph. Dataflow analysis is a fundamental concept in modern  | 
 | 8 | +compilers, and knowledge of the subject will be helpful to prospective  | 
 | 9 | +contributors.  | 
 | 10 | + | 
 | 11 | +However, this documentation is not a general introduction to dataflow analysis.  | 
 | 12 | +It is merely a description of the framework used to define these analyses in  | 
 | 13 | +`rustc`. It assumes that the reader is familiar with some basic terminology,  | 
 | 14 | +such as "transfer function", "fixpoint" and "lattice". If you're unfamiliar  | 
 | 15 | +with these terms, or if you want a quick refresher, [*Static Program Analysis*]  | 
 | 16 | +by Anders Møller and Michael I. Schwartzbach is an excellent, freely available  | 
 | 17 | +textbook.  For those who prefer audiovisual learning, the Goethe University  | 
 | 18 | +Frankfurt has published a series of short [youtube lectures][goethe] in English  | 
 | 19 | +that are very approachable.  | 
 | 20 | + | 
 | 21 | +## Defining a Dataflow Analysis  | 
 | 22 | + | 
 | 23 | +The interface for dataflow analyses is split into three traits. The first is  | 
 | 24 | +[`AnalysisDomain`], which must be implemented by *all* analyses. In addition to  | 
 | 25 | +the type of the dataflow state, this trait defines the initial value of that  | 
 | 26 | +state at entry to each block, as well as the direction of the analysis, either  | 
 | 27 | +forward or backward. The domain of your dataflow analysis must be a [lattice][]  | 
 | 28 | +(strictly speaking a join-semilattice) with a well-behaved `join` operator. See  | 
 | 29 | +documentation for the [`lattice`] module, as well as the [`JoinSemiLattice`]  | 
 | 30 | +trait, for more information.  | 
 | 31 | + | 
 | 32 | +You must then provide *either* a direct implementation of the [`Analysis`] trait  | 
 | 33 | +*or* an implementation of the proxy trait [`GenKillAnalysis`]. The latter is for  | 
 | 34 | +so-called ["gen-kill" problems], which have a simple class of transfer function  | 
 | 35 | +that can be applied very efficiently. Analyses whose domain is not a `BitSet`  | 
 | 36 | +of some index type, or whose transfer functions cannot be expressed through  | 
 | 37 | +"gen" and "kill" operations, must implement `Analysis` directly, and will run  | 
 | 38 | +slower as a result. All implementers of `GenKillAnalysis` also implement  | 
 | 39 | +`Analysis` automatically via a default `impl`.  | 
 | 40 | + | 
 | 41 | + | 
 | 42 | +```text  | 
 | 43 | + AnalysisDomain  | 
 | 44 | +       ^  | 
 | 45 | +       |          | = has as a supertrait  | 
 | 46 | +       |          . = provides a default impl for  | 
 | 47 | +       |  | 
 | 48 | +   Analysis  | 
 | 49 | +     ^   ^  | 
 | 50 | +     |   .  | 
 | 51 | +     |   .  | 
 | 52 | +     |   .  | 
 | 53 | + GenKillAnalysis  | 
 | 54 | +
  | 
 | 55 | +```  | 
 | 56 | + | 
 | 57 | +### Transfer Functions and Effects  | 
 | 58 | + | 
 | 59 | +The dataflow framework in `rustc` allows each statement inside a basic block as  | 
 | 60 | +well as the terminator to define its own transfer function. For brevity, these  | 
 | 61 | +individual transfer functions are known as "effects". Each effect is applied  | 
 | 62 | +successively in dataflow order, and together they define the transfer function  | 
 | 63 | +for the entire basic block. It's also possible to define an effect for  | 
 | 64 | +particular outgoing edges of some terminators (e.g.  | 
 | 65 | +[`apply_call_return_effect`] for the `success` edge of a `Call`  | 
 | 66 | +terminator). Collectively, these are known as per-edge effects.  | 
 | 67 | + | 
 | 68 | +The only meaningful difference (besides the "apply" prefix) between the methods  | 
 | 69 | +of the `GenKillAnalysis` trait and the `Analysis` trait is that an `Analysis`  | 
 | 70 | +has direct, mutable access to the dataflow state, whereas a `GenKillAnalysis`  | 
 | 71 | +only sees an implementer of the `GenKill` trait, which only allows the `gen`  | 
 | 72 | +and `kill` operations for mutation.  | 
 | 73 | + | 
 | 74 | +Observant readers of the documentation for these traits may notice that there  | 
 | 75 | +are actually *two* possible effects for each statement and terminator, the  | 
 | 76 | +"before" effect and the unprefixed (or "primary") effect. The "before" effects  | 
 | 77 | +are applied immediately before the unprefixed effect **regardless of whether  | 
 | 78 | +the analysis is backward or forward**. The vast majority of analyses should use  | 
 | 79 | +only the unprefixed effects: Having multiple effects for each statement makes  | 
 | 80 | +it difficult for consumers to know where they should be looking. However, the  | 
 | 81 | +"before" variants can be useful in some scenarios, such as when the effect of  | 
 | 82 | +the right-hand side of an assignment statement must be considered separately  | 
 | 83 | +from the left-hand side.  | 
 | 84 | + | 
 | 85 | +### Convergence  | 
 | 86 | + | 
 | 87 | +TODO  | 
 | 88 | + | 
 | 89 | +## Inspecting the Results of a Dataflow Analysis  | 
 | 90 | + | 
 | 91 | +Once you have constructed an analysis, you must pass it to an [`Engine`], which  | 
 | 92 | +is responsible for finding the steady-state solution to your dataflow problem.  | 
 | 93 | +You should use the [`into_engine`] method defined on the `Analysis` trait for  | 
 | 94 | +this, since it will use the more efficient `Engine::new_gen_kill` constructor  | 
 | 95 | +when possible.  | 
 | 96 | + | 
 | 97 | +Calling `iterate_to_fixpoint` on your `Engine` will return a `Results`, which  | 
 | 98 | +contains the dataflow state at fixpoint upon entry of each block. Once you have  | 
 | 99 | +a `Results`, you can can inspect the dataflow state at fixpoint at any point in  | 
 | 100 | +the CFG. If you only need the state at a few locations (e.g., each `Drop`  | 
 | 101 | +terminator) use a [`ResultsCursor`]. If you need the state at *every* location,  | 
 | 102 | +a [`ResultsVisitor`] will be more efficient.  | 
 | 103 | + | 
 | 104 | +```text  | 
 | 105 | +                         Analysis  | 
 | 106 | +                            |  | 
 | 107 | +                            | into_engine(…)  | 
 | 108 | +                            |  | 
 | 109 | +                          Engine  | 
 | 110 | +                            |  | 
 | 111 | +                            | iterate_to_fixpoint()  | 
 | 112 | +                            |  | 
 | 113 | +                         Results  | 
 | 114 | +                         /     \  | 
 | 115 | + into_results_cursor(…) /       \  visit_with(…)  | 
 | 116 | +                       /         \  | 
 | 117 | +               ResultsCursor  ResultsVisitor  | 
 | 118 | +```  | 
 | 119 | + | 
 | 120 | +For example, the following code uses a [`ResultsVisitor`]...  | 
 | 121 | + | 
 | 122 | + | 
 | 123 | +```rust,ignore  | 
 | 124 | +// Assuming `MyVisitor` implements `ResultsVisitor<FlowState = MyAnalysis::Domain>`...  | 
 | 125 | +let my_visitor = MyVisitor::new();  | 
 | 126 | +
  | 
 | 127 | +// inspect the fixpoint state for every location within every block in RPO.  | 
 | 128 | +let results = MyAnalysis()  | 
 | 129 | +    .into_engine(tcx, body, def_id)  | 
 | 130 | +    .iterate_to_fixpoint()  | 
 | 131 | +    .visit_with(body, traversal::reverse_postorder(body), &mut my_visitor);  | 
 | 132 | +```  | 
 | 133 | + | 
 | 134 | +whereas this code uses [`ResultsCursor`]:  | 
 | 135 | + | 
 | 136 | +```rust,ignore  | 
 | 137 | +let mut results = MyAnalysis()  | 
 | 138 | +    .into_engine(tcx, body, def_id)  | 
 | 139 | +    .iterate_to_fixpoint()  | 
 | 140 | +    .into_results_cursor(body);  | 
 | 141 | +
  | 
 | 142 | +// Inspect the fixpoint state immediately before each `Drop` terminator.  | 
 | 143 | +for (bb, block) in body.basic_blocks().iter_enumerated() {  | 
 | 144 | +    if let TerminatorKind::Drop { .. } = block.terminator().kind {  | 
 | 145 | +        results.seek_before_primary_effect(body.terminator_loc(bb));  | 
 | 146 | +        let state = results.get();  | 
 | 147 | +        println!("state before drop: {:#?}", state);  | 
 | 148 | +    }  | 
 | 149 | +}  | 
 | 150 | +```  | 
 | 151 | + | 
 | 152 | +["gen-kill" problems]: https://en.wikipedia.org/wiki/Data-flow_analysis#Bit_vector_problems  | 
 | 153 | +[*Static Program Analysis*]: https://cs.au.dk/~amoeller/spa/  | 
 | 154 | +[`AnalysisDomain`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.AnalysisDomain.html  | 
 | 155 | +[`Analysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html  | 
 | 156 | +[`GenKillAnalysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.GenKillAnalysis.html  | 
 | 157 | +[`JoinSemiLattice`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/lattice/trait.JoinSemiLattice.html  | 
 | 158 | +[`ResultsCursor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/struct.ResultsCursor.html  | 
 | 159 | +[`ResultsVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.ResultsVisitor.html  | 
 | 160 | +[`apply_call_return_effect`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html#tymethod.apply_call_return_effect  | 
 | 161 | +[`into_engine`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html#method.into_engine  | 
 | 162 | +[`lattice`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/lattice/index.html  | 
 | 163 | +[goethe]: https://www.youtube.com/watch?v=NVBQSR_HdL0&list=PL_sGR8T76Y58l3Gck3ZwIIHLWEmXrOLV_&index=2  | 
 | 164 | +[lattice]: https://en.wikipedia.org/wiki/Lattice_(order)  | 
 | 165 | +[wiki]: https://en.wikipedia.org/wiki/Data-flow_analysis#Basic_principles  | 
0 commit comments