Streamline `Symbol`, `InternedString`, and `LocalInternedString`.

We currently have three closely-related symbol types.

`Symbol` is the fundamental type. A `Symbol` is an index. All operations work on that index. `StableHash` is not implemented for it, but there's no reason why it couldn't be. A `Symbol` can be a gensym, which gets special treatment -- it's a guaranteed unique index, even if its chars have been seen before.

`InternedString` is a thin wrapper around `Symbol`. You can convert a `Symbol` to an `InternedString`. It has two differences with `Symbol`.
- Its `PartialOrd`/`Ord`/`Hash` impls use the chars, rather than the index.
- Gensym-ness is ignored/irrelevant.

`LocalInternedString` is an alternative that contains a `&str`. You can convert both `Symbol` and `InternedString` to `LocalInternedString`. Its `PartialOrd`/`Ord`/`Hash` impls (plus `PartialEq`/`Eq`) naturally work on chars. Its main use is to provide a way to look some or all of the individual chars within a `Symbol` or `InternedString`, which is sometimes necessary.

I have always found the differences between these types confusing and hard to remember. Furthermore, the distinction between `Symbol` and `InternedString` is subtle and has caused
[bugs](https://github.com/rust-lang/rust/pull/60815#issuecomment-492850013).

Also, gensyms in general make things a lot more complicated, and it would be great to eliminate them.

Here's what I would like as a final state.

- `Symbol` exists.
- `InternedString` does not exist.
- `LocalInternedString` perhaps exists, but is only used temporarily when code needs access to the chars within a `Symbol`. Alternatively, `Symbol` could provide a `with()` method (like `InternedString` currently has) that provides access to the chars, and then `LocalInternedString` wouldn't be needed.
- `Symbol`'s impl of `Hash` uses the index, and its impl of `StableHash` uses the chars.
- Not sure about `Symbol`'s impl of `PartialOrd`/`Ord`. If a stable ordering is really needed (perhaps for error messages?) we could introduce a `StableOrd` trait and use that in the relevant places, or do a custom sort, or something.
- Gensyms don't really exist. They are simulated: when you call `gensym()`, it just appends a unique suffix. It's worth noting that gensyms are always identifiers, and so the unique suffix can use a non-identifier char. And `Interner` could keep a counter. So "foo" would gensym to something lke "foo$1", "foo$2", etc. Once the suffix is added, they would just be treated as normal symbols (in terms of hashing, etc.) I would hope that identifier gensyms would never be compared with non-identifier symbols, so a false positive equality match should be impossible. (Different types for identifier symbols and non-identifier symbols would protect against that, but might cause other difficulties.) Alternatively, #49300 talks about other ways of dealing with gensyms.
- All this should also help performance, because we'd end up with more operations on indexes, and only the necessary ones on chars (which require TLS lookups).

I haven't even touched on the way lifetimes work in the interner, which are subtle and error-prone. But having fewer types would only make improvements on that front simpler.

Thoughts?

CC @petrochenkov @Zoxc @eddyb @Mark-Simulacrum @michaelwoerister


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streamline `Symbol`, `InternedString`, and `LocalInternedString`. #60869

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streamline Symbol, InternedString, and LocalInternedString. #60869

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Streamline `Symbol`, `InternedString`, and `LocalInternedString`. #60869