Description
Proposal
This MCP has the goal of removing unsound HashStable
impls that currently exist for various widely used collection types, in particular collections derived from std::collections::HashMap
and std::collections::HashSet
(e.g. FxHashMap
, DefIdSet
, etc), and to restrict the HashStable
impls for Ord
-based collections (like BTreeMap
and SortedMap
) to instances with keys that have a stable sort order across compilation session boundaries.
HashMap / HashSet
The iteration order of HashMap
and HashSet
depends on the hash value of keys. This hash value, however, very often is not stable across compilation session boundaries because it can be computed from memory addresses (e.g. Interned<T>
) or unstable IDs (like DefIndex
or CrateNum
). The current HashStable
implementation "solves" this problem by computing an iteration-order-independent fingerprint for these collections -- but this is not sound since the iteration order can influence the outcome of query computation and thus the fingerprint must take it into account. Worst case, this fingerprint inaccuracy can lead to silent miscompilations.
In addition to the problem of hash values, the iteration order of these collections is not defined in any way, i.e. the documentation gives no guarantees. In particular, it is not guaranteed that hash_set.iter()
has the same order as decode(encode(hash_set)).iter()
, so even for keys with stable hash values (e.g. String
), we cannot rely on iteration order staying the same across compilation session boundaries.
What to use instead of HashMap / HashSet
Various forms of HashMap
and HashSet
are widely used in the compiler. Removing their HashStable
implementation means that they cannot be used in query keys and query results anymore. Luckily, there are a couple of good alternatives:
-
We already use
FxIndexMap
andFxIndexSet
in many places. These are high-performance hashing-based collections with the additional guarantee of having a stable iteration order. TheirHashStable
implementation is valid and they can be used as a drop-in replacement ofHashMap
andHashSet
in most circumstances. -
The MCP proposes to add two new collection types
UnorderedMap
andUnorderedSet
torustc_data_structures
that provide the same interface as other map and set collection types, but that do a better job of hiding iteration order. These types can be a good choice in cases where a collection does not have any ordering at the conceptual level. One advantage they have, is that they "erase" the iteration order and thus can safely be built from non-deterministic sources -- such as a regularFxHashMap
or via multi-threaded code.
Interaction with potential_query_instability
lint
MCP 465 introduced the potential_query_instability
lint, which tries to solve much of the same problem as the changes proposed here. However, as discussed on Zulip, neither the lint, nor the changes here will catch all problematic cases. E.g. with UnorderedMap
we can allow safe instances of collection iteration (like map.iter().any(predicate)
) in a single place at the API-level, where with the lint each usage instance would have to be audited and #[allow()]
ed individually. At the same time, the lint can catch cases where we cannot enforce a HashStable
bound, like when using an FxHashMap
locally inside a query implementation.
So this MCP does not propose to change anything about the lint (other than applying it more widely :)
)
Ord
-based collections
In a way, Ord
-based collections like BTreeMap
and SortedMap
are in a better position than HashMap
because they have a well-defined iteration order. We just have to make sure that the keys being sorted by do not introduce instability. To this end, the MCP proposes to add a StableOrd
trait:
/// Trait for marking a type as having a sort order that is
/// stable across compilation session boundaries. More formally:
///
/// Ord::cmp(a1, b1) == Ord:cmp(a2, b2)
/// where a2 = decode(encode(a1, context1), context2)
/// b2 = decode(encode(b1, context1), context2)
///
/// i.e. the result of `Ord::cmp` is not influenced by encoding
/// the values in one session and then decoding them in another
/// session.
///
/// This is trivially true for types where encoding and decoding
/// don't change the bytes of the values that are used during
/// comparison (e.g. u32, String, Path, ...).
///
/// But it is not true for:
/// - `*const T` and `*mut T` because the values of these pointers
/// will change between sessions.
/// - `DefIndex`, `CrateNum`, `LocalDefId`, because their concrete
/// values depend on state that might be different between
/// compilation sessions.
trait StableOrd : Ord {}
With this trait available, we can implement HashStable
only for collection instances where the key has such a stable sort order.
The MCP proposes to implement the trait sparingly, since there is no good, generic way to writing regression tests that make sure a given type has a stable sort order (if you know of one, tell us!).
Mentors or Reviewers
Someone from @rust-lang/wg-incr-comp?
Process
The main points of the Major Change Process are as follows:
- File an issue describing the proposal.
- A compiler team member or contributor who is knowledgeable in the area can second by writing
@rustbot second
.- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a
-C flag
, then full team check-off is required. - Compiler team members can initiate a check-off via
@rfcbot fcp merge
on either the MCP or the PR.
- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a
- Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.
You can read more about Major Change Proposals on forge.
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.