Skip to content

Commit 40daff3

Browse files
committed
move over the query chapter from src/librustc/ty/maps
1 parent 458685b commit 40daff3

File tree

2 files changed

+316
-0
lines changed

2 files changed

+316
-0
lines changed

src/SUMMARY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55
- [Using the compiler testing framework](./running-tests.md)
66
- [Walkthrough: a typical contribution](./walkthrough.md)
77
- [High-level overview of the compiler source](./high-level-overview.md)
8+
- [Queries: demand-driven compilation](./query.md)
9+
- [Incremental compilation](./incremental-compilation.md)
810
- [The parser](./the-parser.md)
911
- [Macro expansion](./macro-expansion.md)
1012
- [Name resolution](./name-resolution.md)

src/query.md

Lines changed: 314 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,314 @@
1+
# Queries: demand-driven compilation
2+
3+
As described in [the high-level overview of the compiler][hl], the
4+
Rust compiler is current transitioning from a traditional "pass-based"
5+
setup to a "demand-driven" system. **The Compiler Query System is the
6+
key to our new demand-driven organization.** The idea is pretty
7+
simple. You have various queries that compute things about the input
8+
-- for example, there is a query called `type_of(def_id)` that, given
9+
the def-id of some item, will compute the type of that item and return
10+
it to you.
11+
12+
[hl]: high-level-overview.html
13+
14+
Query execution is **memoized** -- so the first time you invoke a
15+
query, it will go do the computation, but the next time, the result is
16+
returned from a hashtable. Moreover, query execution fits nicely into
17+
**incremental computation**; the idea is roughly that, when you do a
18+
query, the result **may** be returned to you by loading stored data
19+
from disk (but that's a separate topic we won't discuss further here).
20+
21+
The overall vision is that, eventually, the entire compiler
22+
control-flow will be query driven. There will effectively be one
23+
top-level query ("compile") that will run compilation on a crate; this
24+
will in turn demand information about that crate, starting from the
25+
*end*. For example:
26+
27+
- This "compile" query might demand to get a list of codegen-units
28+
(i.e., modules that need to be compiled by LLVM).
29+
- But computing the list of codegen-units would invoke some subquery
30+
that returns the list of all modules defined in the Rust source.
31+
- That query in turn would invoke something asking for the HIR.
32+
- This keeps going further and further back until we wind up doing the
33+
actual parsing.
34+
35+
However, that vision is not fully realized. Still, big chunks of the
36+
compiler (for example, generating MIR) work exactly like this.
37+
38+
### Invoking queries
39+
40+
To invoke a query is simple. The tcx ("type context") offers a method
41+
for each defined query. So, for example, to invoke the `type_of`
42+
query, you would just do this:
43+
44+
```rust
45+
let ty = tcx.type_of(some_def_id);
46+
```
47+
48+
### Cycles between queries
49+
50+
Currently, cycles during query execution should always result in a
51+
compilation error. Typically, they arise because of illegal programs
52+
that contain cyclic references they shouldn't (though sometimes they
53+
arise because of compiler bugs, in which case we need to factor our
54+
queries in a more fine-grained fashion to avoid them).
55+
56+
However, it is nonetheless often useful to *recover* from a cycle
57+
(after reporting an error, say) and try to soldier on, so as to give a
58+
better user experience. In order to recover from a cycle, you don't
59+
get to use the nice method-call-style syntax. Instead, you invoke
60+
using the `try_get` method, which looks roughly like this:
61+
62+
```rust
63+
use ty::maps::queries;
64+
...
65+
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
66+
Ok(result) => {
67+
// no cycle occurred! You can use `result`
68+
}
69+
Err(err) => {
70+
// A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
71+
// meaning essentially an "in-progress", not-yet-reported error message.
72+
// See below for more details on what to do here.
73+
}
74+
}
75+
```
76+
77+
So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This means that
78+
you must ensure that a compiler error message is reported. You can do that in two ways:
79+
80+
The simplest is to invoke `err.emit()`. This will emit the cycle error to the user.
81+
82+
However, often cycles happen because of an illegal program, and you
83+
know at that point that an error either already has been reported or
84+
will be reported due to this cycle by some other bit of code. In that
85+
case, you can invoke `err.cancel()` to not emit any error. It is
86+
traditional to then invoke:
87+
88+
```
89+
tcx.sess.delay_span_bug(some_span, "some message")
90+
```
91+
92+
`delay_span_bug()` is a helper that says: we expect a compilation
93+
error to have happened or to happen in the future; so, if compilation
94+
ultimately succeeds, make an ICE with the message `"some
95+
message"`. This is basically just a precaution in case you are wrong.
96+
97+
### How the compiler executes a query
98+
99+
So you may be wondering what happens when you invoke a query
100+
method. The answer is that, for each query, the compiler maintains a
101+
cache -- if your query has already been executed, then, the answer is
102+
simple: we clone the return value out of the cache and return it
103+
(therefore, you should try to ensure that the return types of queries
104+
are cheaply cloneable; insert a `Rc` if necessary).
105+
106+
#### Providers
107+
108+
If, however, the query is *not* in the cache, then the compiler will
109+
try to find a suitable **provider**. A provider is a function that has
110+
been defined and linked into the compiler somewhere that contains the
111+
code to compute the result of the query.
112+
113+
**Providers are defined per-crate.** The compiler maintains,
114+
internally, a table of providers for every crate, at least
115+
conceptually. Right now, there are really two sets: the providers for
116+
queries about the **local crate** (that is, the one being compiled)
117+
and providers for queries about **external crates** (that is,
118+
dependencies of the local crate). Note that what determines the crate
119+
that a query is targeting is not the *kind* of query, but the *key*.
120+
For example, when you invoke `tcx.type_of(def_id)`, that could be a
121+
local query or an external query, depending on what crate the `def_id`
122+
is referring to (see the `self::keys::Key` trait for more information
123+
on how that works).
124+
125+
Providers always have the same signature:
126+
127+
```rust
128+
fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>,
129+
key: QUERY_KEY)
130+
-> QUERY_RESULT
131+
{
132+
...
133+
}
134+
```
135+
136+
Providers take two arguments: the `tcx` and the query key. Note also
137+
that they take the *global* tcx (i.e., they use the `'tcx` lifetime
138+
twice), rather than taking a tcx with some active inference context.
139+
They return the result of the query.
140+
141+
#### How providers are setup
142+
143+
When the tcx is created, it is given the providers by its creator using
144+
the `Providers` struct. This struct is generate by the macros here, but it
145+
is basically a big list of function pointers:
146+
147+
```rust
148+
struct Providers {
149+
type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>,
150+
...
151+
}
152+
```
153+
154+
At present, we have one copy of the struct for local crates, and one
155+
for external crates, though the plan is that we may eventually have
156+
one per crate.
157+
158+
These `Provider` structs are ultimately created and populated by
159+
`librustc_driver`, but it does this by distributing the work
160+
throughout the other `rustc_*` crates. This is done by invoking
161+
various `provide` functions. These functions tend to look something
162+
like this:
163+
164+
```rust
165+
pub fn provide(providers: &mut Providers) {
166+
*providers = Providers {
167+
type_of,
168+
..*providers
169+
};
170+
}
171+
```
172+
173+
That is, they take an `&mut Providers` and mutate it in place. Usually
174+
we use the formulation above just because it looks nice, but you could
175+
as well do `providers.type_of = type_of`, which would be equivalent.
176+
(Here, `type_of` would be a top-level function, defined as we saw
177+
before.) So, if we want to add a provider for some other query,
178+
let's call it `fubar`, into the crate above, we might modify the `provide()`
179+
function like so:
180+
181+
```rust
182+
pub fn provide(providers: &mut Providers) {
183+
*providers = Providers {
184+
type_of,
185+
fubar,
186+
..*providers
187+
};
188+
}
189+
190+
fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { .. }
191+
```
192+
193+
NB. Most of the `rustc_*` crates only provide **local
194+
providers**. Almost all **extern providers** wind up going through the
195+
[`rustc_metadata` crate][rustc_metadata], which loads the information from the crate
196+
metadata. But in some cases there are crates that provide queries for
197+
*both* local and external crates, in which case they define both a
198+
`provide` and a `provide_extern` function that `rustc_driver` can
199+
invoke.
200+
201+
[rustc_metadata]: https://github.com/rust-lang/rust/tree/master/src/librustc_metadata
202+
203+
### Adding a new kind of query
204+
205+
So suppose you want to add a new kind of query, how do you do so?
206+
Well, defining a query takes place in two steps:
207+
208+
1. first, you have to specify the query name and arguments; and then,
209+
2. you have to supply query providers where needed.
210+
211+
To specify the query name and arguments, you simply add an entry to
212+
the big macro invocation in
213+
[`src/librustc/ty/maps/mod.rs`][maps-mod]. This will probably have
214+
changed by the time you read this README, but at present it looks
215+
something like:
216+
217+
[maps-mod]: https://github.com/rust-lang/rust/blob/master/src/librustc/ty/maps/mod.rs
218+
219+
```
220+
define_maps! { <'tcx>
221+
/// Records the type of every item.
222+
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
223+
224+
...
225+
}
226+
```
227+
228+
Each line of the macro defines one query. The name is broken up like this:
229+
230+
```
231+
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
232+
^^ ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^
233+
| | | | |
234+
| | | | result type of query
235+
| | | query key type
236+
| | dep-node constructor
237+
| name of query
238+
query flags
239+
```
240+
241+
Let's go over them one by one:
242+
243+
- **Query flags:** these are largely unused right now, but the intention
244+
is that we'll be able to customize various aspects of how the query is
245+
processed.
246+
- **Name of query:** the name of the query method
247+
(`tcx.type_of(..)`). Also used as the name of a struct
248+
(`ty::maps::queries::type_of`) that will be generated to represent
249+
this query.
250+
- **Dep-node constructor:** indicates the constructor function that
251+
connects this query to incremental compilation. Typically, this is a
252+
`DepNode` variant, which can be added by modifying the
253+
`define_dep_nodes!` macro invocation in
254+
[`librustc/dep_graph/dep_node.rs`][dep-node].
255+
- However, sometimes we use a custom function, in which case the
256+
name will be in snake case and the function will be defined at the
257+
bottom of the file. This is typically used when the query key is
258+
not a def-id, or just not the type that the dep-node expects.
259+
- **Query key type:** the type of the argument to this query.
260+
This type must implement the `ty::maps::keys::Key` trait, which
261+
defines (for example) how to map it to a crate, and so forth.
262+
- **Result type of query:** the type produced by this query. This type
263+
should (a) not use `RefCell` or other interior mutability and (b) be
264+
cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for
265+
non-trivial data types.
266+
- The one exception to those rules is the `ty::steal::Steal` type,
267+
which is used to cheaply modify MIR in place. See the definition
268+
of `Steal` for more details. New uses of `Steal` should **not** be
269+
added without alerting `@rust-lang/compiler`.
270+
271+
[dep-node]: https://github.com/rust-lang/rust/blob/master/src/librustc/dep_graph/dep_node.rs
272+
273+
So, to add a query:
274+
275+
- Add an entry to `define_maps!` using the format above.
276+
- Possibly add a corresponding entry to the dep-node macro.
277+
- Link the provider by modifying the appropriate `provide` method;
278+
or add a new one if needed and ensure that `rustc_driver` is invoking it.
279+
280+
#### Query structs and descriptions
281+
282+
For each kind, the `define_maps` macro will generate a "query struct"
283+
named after the query. This struct is a kind of a place-holder
284+
describing the query. Each such struct implements the
285+
`self::config::QueryConfig` trait, which has associated types for the
286+
key/value of that particular query. Basically the code generated looks something
287+
like this:
288+
289+
```rust
290+
// Dummy struct representing a particular kind of query:
291+
pub struct type_of<'tcx> { phantom: PhantomData<&'tcx ()> }
292+
293+
impl<'tcx> QueryConfig for type_of<'tcx> {
294+
type Key = DefId;
295+
type Value = Ty<'tcx>;
296+
}
297+
```
298+
299+
There is an additional trait that you may wish to implement called
300+
`self::config::QueryDescription`. This trait is used during cycle
301+
errors to give a "human readable" name for the query, so that we can
302+
summarize what was happening when the cycle occurred. Implementing
303+
this trait is optional if the query key is `DefId`, but if you *don't*
304+
implement it, you get a pretty generic error ("processing `foo`...").
305+
You can put new impls into the `config` module. They look something like this:
306+
307+
```rust
308+
impl<'tcx> QueryDescription for queries::type_of<'tcx> {
309+
fn describe(tcx: TyCtxt, key: DefId) -> String {
310+
format!("computing the type of `{}`", tcx.item_path_str(key))
311+
}
312+
}
313+
```
314+

0 commit comments

Comments
 (0)