Description
Proposal
The compiler uses interning a lot, especially within rustc_middle::ty
.
Pros of interning
- It saves lots of memory by avoiding duplicated values.
- Speedups by using pointer-based
Eq
andHash
, i.e. compare/hash the addresses of the interned value (which is guaranteed to be unique) instead of the contents. (I recently tried changing some of the highly used types likeTy
andList
to use contents-basedEq
andHash
, and got widespread instruction count increases of up to 10%, which is a lot.)
Cons of interning
- Interning values involves hashing their contents, which isn't free, but hopefully is balanced out by the pros above.
- Pointer-based hashing increases non-determinism.
An obvious way to deal with interned values is to use Rust's type system to encapsulate things. Surprisingly, this isn't currently done.
- There is
rustc_data_structures::ptr_key::PtrKey<T>
, which is a smart pointer that does pointer-basedEq
andHash
. But it's barely used, barely documented, has a non-descriptive name, and has no protection against misuse. - Several frequently used types are interned and use pointer-based
Eq
andHash
, e.g.List<T>
,Ty
,Predicate
. - Other frequently used types are interned but don't use pointer-based
Eq
andHash
, e.g.Region
,Const
. - It's not obvious that these types are interned from their type declaration; you have to look around nearby code to realize this. There's also not much protection against accidental misuse, e.g. comparing an interned value against a non-interned value.
I propose renaming ptr_key::PtrKey<T>
as intern::Interned
, improving its documentation, and providing some protection against misuse:
- Have a zero-sized private type as a field so you can pattern match an
Interned<T>
but not construct anInterned<T>
. - Have a method
Interned::new_unchecked
for constructing values, the name of which makes it obvious that some care is required. - Eventually, moving some of the interning structures (e.g.
InternedSet
,ShardedHashMap::intern
) into this module so that most places don't even need to useInterned::new_unchecked
, but instead just call anintern
method directly.
I then propose using Interned<T>
for most/all of the interned types in rustc_middle::ty
: Ty
, Predicate
, Region
, Const
, List
, etc. For some of these it will involve using a newtype instead of a typedef. E.g. this
pub type Ty<'tcx> = &'tcx TyS<'tcx>;
will become this:
pub struct Ty<'tcx>(Interned<'tcx, TyS<'tcx>>);
Pros of this change
- Clarity: it'll be obvious which types are interned and thus use pointer-based
Eq
andHash
. Also, the use of newtypes allows methods to be impl'd directly on the frequently used types likeTy
, rather than on the infrequently used inner types likeTyS
. - Correctness: because of the improve clarity, it'll be much harder to mess things up, e.g. comparing an interned value with a non-interned value.
- Consistency: all the interned types will look and act the same.
- Performance: more pointer-based
Eq
andHash
will give some small wins.
Cons of this change
- It'll touch a lot of lines of code. Most of the changes are tedious; lots of sigil fiddling, pattern matching fiddling, stuff like that.
- More layering of types means pattern matching is a bit uglier in places, where you have to unwrap an additional layer or two of newtypes.
- Non-determinism: slightly increased due to more pointer hashing, but we're already doing a bunch of that anyway.
I think this is a clear improvement. TBH, I'm a bit surprised that the code isn't already like this. It's no fun to have to schlep through the code to work out which types are interned and whether they're using pointer-based Eq
and Hash
. Because of this, it took me some time to reach my current level of understanding about interning. E.g. see rust-lang/rust#91617 where I added documentation to List<T>
; prior to that PR list.rs
had no mention of interning or uniqueness of values, and I had to ask a bunch of questions on Zulip to understand how the type worked.
I also think this it's borderline whether this change needs an MCP. It will touch a lot of code but not in a deep way; it's mostly just using types to more clearly encode the existing behaviour. There are no user-visible changes, existing major types are not renamed, and it's not really an architectural change. But an MCP was requested so here it is 😄
I have already written rust-lang/rust#93148 that introduces Interned
and changes Ty
, Predicate
, Region
, and Const
. I pretty much had to write the code to make sure that the proposal was reasonable.
Mentors or Reviewers
I'm planning to do the work myself, and I've already done a big chunk of it. @fee1-dead or @michaelwoerister are likely reviewers.
Process
The main points of the Major Change Process are as follows:
- File an issue describing the proposal.
- A compiler team member or contributor who is knowledgeable in the area can second by writing
@rustbot second
.- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a
-C flag
, then full team check-off is required. - Compiler team members can initiate a check-off via
@rfcbot fcp merge
on either the MCP or the PR.
- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a
- Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.
You can read more about Major Change Proposals on forge.
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.