Skip to content

unique: new package with unique.Handle #62483

Closed
@mknyszek

Description

@mknyszek

Proposal: unique package

Updated: 10 April 2024

Oct 4th EDIT: Changed package name from "intern" to "unique" and renamed "Symbol" to "Handle" based on feedback.
Apr 10th EDIT: Updated the proposal to describe the actual implementation. Notably, map entries can be dropped after just a single GC cycle instead of two, and there's no more object resurrection, which has several well-known problems.

CC @golang/runtime

Motivation

The lack of runtime support for interning in Go represents a gap with other languages. Demand in the Go community for a weak map and/or string interning has shown up in a few GitHub issues over the years. Although the section of the community requesting these features is quite small, we see the consequences of not having built-in support for this functionality in the form of the go4.org/intern package.

go4.org/intern is a package that implements a global intern pool. Its goals are two-fold: intern data to deduplicate copies, and provide fast comparisons for such interned data. Functionally, it returns a unique global identity for every value placed into it. That value is weakly held by the pool's internals, to avoid letting the pool accumulate in an unbounded manner. This implementation follows in the footsteps of Lisp: interning a value produces a "symbol" that may be used for a fast direct comparison and provides a method through which the interned value may be recovered.

Although it has only 4 direct importers, it is transitively used quite widely (estimated 0.1% of modules) thanks to use by inet.af/netaddr (reverse module deps), which uses it to deduplicate strings and reduce the cost of their comparisons. This has caused friction within the ecosystem because go4.org/intern makes assumptions about the implementation of Go. In particular, it imports the package go4.org/unsafe/assume-no-moving-gc to validate those assumptions. That package, by design, needs to be updated with every release of Go.

inet.af/netaddr's functionality has also recently been merged into the standard library as the net/netip package, bringing along with it a copy of go4.org/intern as the internal/intern package. Furthermore, go4.org/unsafe/assume-no-moving-gc now references an internal Go runtime variable to avoid having to update every release.

Although the immediate issue has been mitigated, the gap in functionality remains, and we now have a copy of go4.org/intern in the standard library anyway.

Goals

The goal of this proposal is to tidy up internal/intern's interface with generics, integrate it more tightly into the runtime (for efficiency and future-proofing), and expose it as a standard library package.

This proposal will also motivate why we should move forward with an API that's similar go4.org/intern and not some other direct or indirect ways of enabling efficient value interning.

Design

We more-or-less propose to move the go4.org/intern package API into the standard library, but with a few tweaks. The biggest difference is that go4.org/intern uses an interface to represent the "key" part of the mapping, with a special case for strings (GetByString) to avoid an additional allocation for the string header. Using generics, we can avoid these additional allocations while also improving type-safety and ergonomics.

Here's a sketch of the revised API, which will live in a new package called "unique":

package unique

// Make returns a globally unique handle for a value of type T. Handles
// are equal if and only if the values used to produce them are equal.
func Make[T comparable](value T) Handle[T] {
    ...
}

// Handle is a globally unique identity for some value of type T.
//
// It is trivially and efficiently comparable with other Handle[T] for the same T.
//
// If the value used to create the symbol is the same, the symbols are guaranteed
// to be equal.
type Handle[T comparable] struct {
    value *T
}

// Value returns a shallow copy of the T value that produced the Handle.
func (h Handle[T]) Value() T {
    return *h.value
}

The unique package must maintain an internal mapping of values to globally unique symbols, which risks growing without bound. However, once no copies of symbols produced by Make are reachable, the program loses access to that globally unique identity. The intern package is then free to produce a new one without the program noticing. If it can produce a new one, it can just as well delete the old one during that time period in which no copy of a symbol exists for a given value.

In practice, integration with the garbage collector is necessary to achieve this, introducing the risk of leaking details about the garbage collector's implementation.

A key property to consider with functionality that relies on specific behavior from the garbage collector is whether it preserves the illusion that the executing program has infinite memory (with a GC there's a malloc but no free). The utility of such an illusion is that it makes programs significantly simpler to reason about. When programs are able to observe the fact that the garbage collector reclaimed memory, it is possible for programs to rely on non-deterministic properties of that reclamation. Finalizers are a good example of a feature that breaks the illusion and are difficult to use correctly.

Luckily, it's not possible for programs to notice when New returns a different Handle[T] precisely because it can produce new symbols without the program noticing. Therefore, it's not possible for a program to observe when memory is reclaimed. (Someone could use the unsafe package to write down the Handle[T]'s internal value as a uintptr, but that specifically requires the use of unsafe. One also can't do much with that uintptr; casting that back to a pointer violates the unsafe.Pointer rules.)

Implementation

The core data structure is approximately a map[any]*T, where the *T is a weak reference. The runtime fully controls the *T created here, so it would attach a special that represents a handle to the value that can go nil once the object is reclaimed. Once the *T is no longer referenced, the GC will clear the handle. Later, a background goroutine will clean up map entries with nil handles.

Note that user code here is racing with the garbage collector. It's possible that in between the garbage collector noticing that there are no more references to the *T and the value being collected, the program may read the value out and produce a new strong pointer to the *T. To alleviate this, the reader must synchronize with the garbage collector. To avoid issues with object resurrection and to allow for immediate reclamation of memory, the reader can simply ensure the span containing the T is always swept before accessing the weak pointer handle. The only costs here are thus the highly-optimized span lookup and, once per GC, a single goroutine may need to sweep the span, which is generally quite fast. Note that if a value is being lookup frequently, then only the first lookup in each GC cycle will need to sweep; otherwise it'll return quickly.

This is quite different to the implementation of go4.org/intern, and is only possible by accessing runtime internals. It allows us to reclaim memory in just a single GC cycle instead of the 3 that are currently required by the intern package.

Next is the map implementation itself. Because the map will be global, the underlying data structure will need to be thread-safe. Unfortunately there are far too many choices for a concurrent map data structure, so we need to cut them down by setting some goals and requirements. Here's a list:

  • Reads are expected to be much more popular than writes.
    • Serializing writers (though allowing them to be concurrent with readers) is likely acceptable.
  • Reads from the map should be allowed to execute concurrently with one another.
  • Deletions from the map can happen relatively rarely (once per GC), in the background, and in bulk.
  • Iteration is not strictly necessary since a candidate list of removals may be maintained by the GC.
    • If it turns out to be desirable, there could only ever be a single iterator at a time for deletion.
  • To eliminate an indirection, being able to directly mutate map values would be nice.
  • The map needs to be able to grow and shrink efficiently.

A traditional bucketed hash map is not terribly difficult to make concurrently under these circumstances, but has the downside that incremental growth and shrinking of such a map is quite complicated. Trying to extend the existing map implementation with concurrency features is also likely to be complicated.

Another option is to use something like an adaptive radix tree over 8-byte hash values. The growth and shrinking of such a tree comes naturally, and making them concurrent within the bounds of our requirements isn't too complicated. The downside is poor locality because of the tree structure. I suspect that at least to begin with, something like this will provide good enough performance.

For the actual implementation, we pick a simplified form of the adaptive radix tree specialized for uintptr values (hashes), essentially forming a hash-trie. It's straightforward to make reads out of this data structure perfectly scalable. Insertions and deletions will be performed using fine-grained locking techniques, reducing contention.

As an additional note on performance, calls to Make should always avoid allocating in the case where the provided value is already in the map. Meanwhile they should explicitly clone the value when adding it to the map. (This means if T is a struct with pointers in it, the values those pointers point to are almost always going to be forced to escape to the heap, like with regular Go maps.)

Risks

API design impact

The fact that interned values are represented by a type Handle[T] means that details around interning may encourage two versions of APIs, one supporting interned values and the other not. Contrast this with an "opaque" alternative (see "Opaque string interning" in the alternatives section) that makes no distinction between interned and non-interned values in the type system.

I believe this is a legitimate concern, but suspect that it will mostly be mitigated in practice. Interning is a somewhat niche technique that helps performance dramatically in certain cases, but only in those certain cases that clearly benefit from data deduplication and/or fast comparisons of data. Elsewhere it's more clearly just cumbersome and simple microbenchmarks should reveal the slowdown.

Thus, I think "polluting" APIs in the cases where it's useful is likely worth the tradeoff, and it's sufficiently cumbersome to use that where it's not necessary it will simply be ignored. I believe the fact that go4.org/intern has relatively few direct importers supports this hypothesis.

Poor performance

One situation we absolutely want to avoid is performance issues like with Java's String.intern. My best understanding of the situation (as of when the linked blog post was written) is that:

  • The global intern map is not resizable, so once the map fills up, lookups become expensive.
  • The global intern map leaks memory (map entries are not bound to the lifetime of their values).
  • The global intern map is part of the GC root set, and it's not well accounted for by the GC (it's just scanned during a STW), leading to high worst-case pause times when the map has many entries.

I believe we avoid all three of these issues in the implementation described above. The third one is less obvious if you're not familiar with the existing Go GC implementation. Briefly, the global intern map would only be a GC root insofar as any global variable is a GC root (i.e. the entire map would not be part of the root set, just the reference to it). And even if it was fully part of the root set, the Go GC already shards and scans the root set concurrently with the mutator executing. Hence, no pause-time impact.

Disclaimer: I don't know if this is still an issue in the Java world; I didn't look into this too deeply. If it's not, then that's great to hear. Nevertheless, it's still worthwhile to learn from past attempts at interning.

Alternatives considered

Opaque string interning

One alternative is interning just for strings, which is a common case. For instance:

package strings

// Intern takes a string and returns a string with equivalent contents,
// though possibly one that shares memory with other strings. This can
// be used to save memory in applications with many duplicate string
// values. If a string is successfully interned, a copy is always made.
func Intern(s string) string

Although such an API is temptingly simple, it's not really useful for any other kind of data structure. Only strings are properly immutable in the language.

Furthermore, because the interning is opaque, we don't get the full benefit of cheap comparisons out-of-the-box. The string comparison fast path would be taken more frequently, but when there's no pointer equality the string data would still need to be compared. To get the full benefit, there would need to be some additional runtime and/or compiler support to identify when two interned strings are being compared, which may end up slowing down non-interned string operations as well.

This is still worth considering for the future, but doesn't properly address the use-cases this proposal intends to address.

Background string deduplication

Another alternative is to just have the runtime deduplicate strings in the background. For instance, the JVM G1 garbage collector has a flag for such a feature (off by default). The advantage of this approach is the programmer has to set one flag and they get to save on memory costs.

However, like the previous alternative, this only really applies to strings again, because only strings are properly immutable at the language level. The other problem is that this feature would need to be opt-in, requiring a new top-level runtime knob. (Presumably this feature isn't on by default because it's not always worth the CPU cost in the garbage collector.) It's also substantially more complex to implement this in the Go garbage collector, because it doesn't currently know the type of anything in the heap.

Weak references

An even more general alternative to the proposed API is to just add support for weak references to the standard library and/or language. After all, the proposed implementation conceptually just uses a weak reference in its implementation anyway.

The main issue with weak references is that they're very hard to use in a system with tracing garbage collection, since they can turn out to be nil at very surprising times (or possibly never). Fundamentally, they break the aforementioned infinite memory illusion, because they reveal when memory is reclaimed.

The bar is extremely high for adding anything this difficult to use, and I believe we should prefer easier-to-use abstractions as much as possible.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions