Description
Correctness of incremental compilation relies on the assumption that a query always yields the same result for the same set of inputs, i.e. that queries are deterministic functions. We recently discovered that checking fingerprint stability of query results can detect violations of this assumption and that breaking this assumption can lead to miscompilations (see #84970).
This issue argues that a similar set of circumstances can arise in the code path where we have not yet enabled the fingerprint stability check by default.
First some background on how miscompilations can happen: When the query system is asked to execute a query Q
in incremental mode it will first walk the dependency graph of the query, trying to mark Q
's dep-node as green. If all of Q
's inputs are green, Q
itself is green too and the system assumes that its result is the same as in the previous compilation session. While walking the graph for marking, all the nodes encountered are also marked as green -- however, their corresponding queries are not necessarily executed. Only later if another query actually needs the result of one such intermediate query Q_i
, it will be executed. If at that point the already green query yields a different result than what we previously relied on, we end up with an inconsistent (and therefore dangerous) state: Some dependents of Q_i
have loaded their result from the cache under the assumption that Q_i
's result was the same as in the previous session; but other dependents of Q_i
have computed their own result with a different value for Q_i
as input:
+-------- Q1 (assumes Q_i == x)
|
v
(...) <-------- Q_i
^
|
+-------- Q2 (computes its result with Q_i == x' as input)
Assuming for example that Q1
is a vtable layout with methods in the order m1, m2, m3
and Q2
is an indirect method call to m2
-- assuming that m2
is at index 0
because Q_i
yielded the vtable order m2, m3, m1
when re-executed -- then we can see how this can lead to a miscompilation where the method call that Q2
represents calls the wrong method.
Now, #83007 turned on fingerprint stability checks for all cases where a query is re-executed by the query system. This will catch violations of the query determinism assumption because we compare the fingerprint on disk (corresponding to Q_i == x
) to the fingerprint of the newly computed value (corresponding to Q_i == x'
). If the fingerprints differ we know there is a dangerous mismatch and abort the compilation session.
Note, however, that the same kind of mismatch can occur when the value for Q_i
is loaded from disk: if Q_i
is marked as green and Q1
thus loads its result from disk under the assumption that Q_i == x
but then we later load the result of Q_i
from disk and -- because of a bug in query result serialization -- we get Q_i == x'
, then we can end up with the same kind of dangerous inconsistency as in the other case. The diagram above would apply to this case too, no modification needed.
In summary: both unstable query provider implementations and buggy query result (de-)serialization can lead to the same kind of dangerous inconsistencies during incremental compilation, but we currently only have a safety check for the first of the two cases.
I personally believe that the risk of buggy serialization code is smaller than the risk of unstable query provider implementations but I still think it is prudent to check both cases.
@Aaron1011, do you agree with my analysis of the situation?
cc @rust-lang/compiler