Streamline Symbol
, InternedString
, and LocalInternedString
. #60869
Description
We currently have three closely-related symbol types.
Symbol
is the fundamental type. A Symbol
is an index. All operations work on that index. StableHash
is not implemented for it, but there's no reason why it couldn't be. A Symbol
can be a gensym, which gets special treatment -- it's a guaranteed unique index, even if its chars have been seen before.
InternedString
is a thin wrapper around Symbol
. You can convert a Symbol
to an InternedString
. It has two differences with Symbol
.
- Its
PartialOrd
/Ord
/Hash
impls use the chars, rather than the index. - Gensym-ness is ignored/irrelevant.
LocalInternedString
is an alternative that contains a &str
. You can convert both Symbol
and InternedString
to LocalInternedString
. Its PartialOrd
/Ord
/Hash
impls (plus PartialEq
/Eq
) naturally work on chars. Its main use is to provide a way to look some or all of the individual chars within a Symbol
or InternedString
, which is sometimes necessary.
I have always found the differences between these types confusing and hard to remember. Furthermore, the distinction between Symbol
and InternedString
is subtle and has caused
bugs.
Also, gensyms in general make things a lot more complicated, and it would be great to eliminate them.
Here's what I would like as a final state.
Symbol
exists.InternedString
does not exist.LocalInternedString
perhaps exists, but is only used temporarily when code needs access to the chars within aSymbol
. Alternatively,Symbol
could provide awith()
method (likeInternedString
currently has) that provides access to the chars, and thenLocalInternedString
wouldn't be needed.Symbol
's impl ofHash
uses the index, and its impl ofStableHash
uses the chars.- Not sure about
Symbol
's impl ofPartialOrd
/Ord
. If a stable ordering is really needed (perhaps for error messages?) we could introduce aStableOrd
trait and use that in the relevant places, or do a custom sort, or something. - Gensyms don't really exist. They are simulated: when you call
gensym()
, it just appends a unique suffix. It's worth noting that gensyms are always identifiers, and so the unique suffix can use a non-identifier char. AndInterner
could keep a counter. So "foo" would gensym to something lke "foo$1", "foo$2", etc. Once the suffix is added, they would just be treated as normal symbols (in terms of hashing, etc.) I would hope that identifier gensyms would never be compared with non-identifier symbols, so a false positive equality match should be impossible. (Different types for identifier symbols and non-identifier symbols would protect against that, but might cause other difficulties.) Alternatively, syntax_pos::symbol::Symbol::gensym() is incompatible with stable hashing. #49300 talks about other ways of dealing with gensyms. - All this should also help performance, because we'd end up with more operations on indexes, and only the necessary ones on chars (which require TLS lookups).
I haven't even touched on the way lifetimes work in the interner, which are subtle and error-prone. But having fewer types would only make improvements on that front simpler.
Thoughts?
CC @petrochenkov @Zoxc @eddyb @Mark-Simulacrum @michaelwoerister