Skip to content

Commit 65263b6

Browse files
committed
breakup the MIR section and add an incremental compilation section
1 parent 40daff3 commit 65263b6

File tree

3 files changed

+145
-4
lines changed

3 files changed

+145
-4
lines changed

src/SUMMARY.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
- [Walkthrough: a typical contribution](./walkthrough.md)
77
- [High-level overview of the compiler source](./high-level-overview.md)
88
- [Queries: demand-driven compilation](./query.md)
9-
- [Incremental compilation](./incremental-compilation.md)
9+
- [Incremental compilation](./incremental-compilation.md)
1010
- [The parser](./the-parser.md)
1111
- [Macro expansion](./macro-expansion.md)
1212
- [Name resolution](./name-resolution.md)
@@ -15,8 +15,9 @@
1515
- [Type inference](./type-inference.md)
1616
- [Trait resolution](./trait-resolution.md)
1717
- [Type checking](./type-checking.md)
18-
- [MIR construction](./mir-construction.md)
19-
- [MIR borrowck](./mir-borrowck.md)
20-
- [MIR optimizations](./mir-optimizations.md)
18+
- [The MIR (Mid-level IR)](./mir.md)
19+
- [MIR construction](./mir-construction.md)
20+
- [MIR borrowck](./mir-borrowck.md)
21+
- [MIR optimizations](./mir-optimizations.md)
2122
- [trans: generating LLVM IR](./trans.md)
2223
- [Glossary](./glossary.md)

src/incremental-compilation.md

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# Incremental compilation
2+
3+
The incremental compilation scheme is, in essence, a surprisingly
4+
simple extension to the overall query system. We'll start by describing
5+
a slightly simplified variant of the real thing, the "basic algorithm", and then describe
6+
some possible improvements.
7+
8+
## The basic algorithm
9+
10+
The basic algorithm is
11+
called the **red-green** algorithm[^salsa]. The high-level idea is
12+
that, after each run of the compiler, we will save the results of all
13+
the queries that we do, as well as the **query DAG**. The
14+
**query DAG** is a [DAG] that indices which queries executed which
15+
other queries. So for example there would be an edge from a query Q1
16+
to another query Q2 if computing Q1 required computing Q2 (note that
17+
because queries cannot depend on themselves, this results in a DAG and
18+
not a general graph).
19+
20+
[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph
21+
22+
On the next run of the compiler, then, we can sometimes reuse these
23+
query results to avoid re-executing a query. We do this by assigning
24+
every query a **color**:
25+
26+
- If a query is colored **red**, that means that its result during
27+
this compilation has **changed** from the previous compilation.
28+
- If a query is colored **green**, that means that its result is
29+
the **same** as the previous compilation.
30+
31+
There are two key insights here:
32+
33+
- First, if all the inputs to query Q are colored green, then the
34+
query Q **must** result in the same value as last time and hence
35+
need not be re-executed (or else the compiler is not deterministic).
36+
- Second, even if some inputs to a query changes, it may be that it
37+
**still** produces the same result as the previous compilation. In
38+
particular, the query may only use part of its input.
39+
- Therefore, after executing a query, we always check whether it
40+
produced the same result as the previous time. **If it did,** we
41+
can still mark the query as green, and hence avoid re-executing
42+
dependent queries.
43+
44+
### The try-mark-green algorithm
45+
46+
The core of the incremental compilation is an algorithm called
47+
"try-mark-green". It has the job of determining the color of a given
48+
query Q (which must not yet have been executed). In cases where Q has
49+
red inputs, determining Q's color may involve re-executing Q so that
50+
we can compare its output; but if all of Q's inputs are green, then we
51+
can determine that Q must be green without re-executing it or inspect
52+
its value what-so-ever. In the compiler, this allows us to avoid
53+
deserializing the result from disk when we don't need it, and -- in
54+
fact -- enables us to sometimes skip *serializing* the result as well
55+
(see the refinements section below).
56+
57+
Try-mark-green works as follows:
58+
59+
- First check if there is the query Q was executed during the previous
60+
compilation.
61+
- If not, we can just re-execute the query as normal, and assign it the
62+
color of red.
63+
- If yes, then load the 'dependent queries' that Q
64+
- If there is a saved result, then we load the `reads(Q)` vector from the
65+
query DAG. The "reads" is the set of queries that Q executed during
66+
its execution.
67+
- For each query R that in `reads(Q)`, we recursively demand the color
68+
of R using try-mark-green.
69+
- Note: it is important that we visit each node in `reads(Q)` in same order
70+
as they occurred in the original compilation. See [the section on the query DAG below](#dag).
71+
- If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is dirty.
72+
- We re-execute Q and compare the hash of its result to the hash of the result
73+
from the previous compilation.
74+
- If the hash has not changed, we can mark Q as **green** and return.
75+
- Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that case,
76+
we can color Q as **green** and return.
77+
78+
<a name="dag">
79+
80+
### The query DAG
81+
82+
The query DAG code is stored in
83+
[`src/librustc/dep_graph`][dep_graph]. Construction of the DAG is done
84+
by instrumenting the query execution.
85+
86+
One key point is that the query DAG also tracks ordering; that is, for
87+
each query Q, we noy only track the queries that Q reads, we track the
88+
**order** in which they were read. This allows try-mark-green to walk
89+
those queries back in the same order. This is important because once a subquery comes back as red,
90+
we can no longer be sure that Q will continue along the same path as before.
91+
That is, imagine a query like this:
92+
93+
```rust,ignore
94+
fn main_query(tcx) {
95+
if tcx.subquery1() {
96+
tcx.subquery2()
97+
} else {
98+
tcx.subquery3()
99+
}
100+
}
101+
```
102+
103+
Now imagine that in the first compilation, `main_query` starts by
104+
executing `subquery1`, and this returns true. In that case, the next
105+
query `main_query` executes will be `subquery2`, and `subquery3` will
106+
not be executed at all.
107+
108+
But now imagine that in the **next** compilation, the input has
109+
changed such that `subquery` returns **false**. In this case, `subquery2` would never
110+
execute. If try-mark-green were to visit `reads(main_query)` out of order,
111+
however, it might have visited `subquery2` before `subquery1`, and hence executed it.
112+
This can lead to ICEs and other problems in the compiler.
113+
114+
[dep_graph]: https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph
115+
116+
## Improvements to the basic algorithm
117+
118+
In the description basic algorithm, we said that at the end of
119+
compilation we would save the results of all the queries that were
120+
performed. In practice, this can be quite wasteful -- many of those
121+
results are very cheap to recompute, and serializing + deserializing
122+
them is not a particular win. In practice, what we would do is to save
123+
**the hashes** of all the subqueries that we performed. Then, in select cases,
124+
we **also** save the results.
125+
126+
This is why the incremental algorithm separates computing the
127+
**color** of a node, which often does not require its value, from
128+
computing the **result** of a node. Computing the result is done via a simple algorithm
129+
like so:
130+
131+
- Check if a saved result for Q is available. If so, compute the color of Q.
132+
If Q is green, deserialize and return the saved result.
133+
- Otherwise, execute Q.
134+
- We can then compare the hash of the result and color Q as green if
135+
it did not change.
136+
137+
# Footnotes
138+
139+
[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis

src/mir.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# The MIR (Mid-level IR)

0 commit comments

Comments
 (0)