proposal: Go 2: Heapless Go #56236
Labels
FrozenDueToAge
LanguageChange
Suggested changes to the Go language
Proposal
Proposal-FinalCommentPeriod
v2
An incompatible library change
Milestone
Author background
Related proposals
Proposal
Through escape analysis improvements, memory lifetimes would become understood.
Banning pointer passing between threads will ensure simple variable lifetimes, then bringing-back provable cases will enable most modern code's reuse without a heap.
Through mid-stack growth, data can always be allocated in-stack at its lowest variable.
Far more information below.
Costs
-- Single-thread: no language change
-- Multi-thread: Simpler. A prover would keep usage safe.
-- C API/Unsafe: more difficult.
-- Better compiler technology around variable lifetimes. Changed cases for performance. Some highly-optimized code may not get to participate initially as it's unprovable.
Who needs a heap?
A. we have sufficient stack space AND
B. there are no variables past the end of that stack space that we are moving.
C. all consumers are in frames below it AND
Solutions
A has a simple solution that works well already: stack growth.
B could be arranged with improved layout and only 1 grower per stack frame. We may need to insert fake frames for later growers.
C would a strategy. We could move everything in the frames above the growing object, then fix the pointers (since we know all pointers in Go). This "heavy" operation is still a mutex-free memcpy then a consultation of stack maps. It's not cheap, but it's probably cheaper than garbage collection and can be done at known times (by consulting the capacity field).
Globals
They could be on that Goroutine's stack.
Multi-threading
The simplest solution for the compiler writer is to ban all pointer passing between threads and hide this hideous language change behind a flag. That would result in deep copies everywhere, no pointers in globals or chan messages!
This would also obsolete sync & atomic, but would ensure any code written will not experience undefined memory errors.
Clearly that's excessive.
Though you could still write a global cache (with chan), most multi-thread libraries break.
But building from that could include a bigger concept of recognizing const for globals so they could have pointers. Then pointers could be passed when they have well-understood lifetimes, such as having all globals on the main thread's stack so they should outlive other threads. Although it may be best to allow only provably safe data exchanges to preserve that property, such as all variable writes on a shared object must have readers and writers mutex-protected.
Scatter-gather patterns (and others) could be detected where the parent outlives the children that interact with the objects, thereby knowing the lifetimes.
Handoff cases could also be determined where a chan-write is the last access, so the object gets literally copied over (with no out-pointers) like in the case of HTTP listeners.
A kind of cat-and-mouse of rich use-cases could grow here for years and result in nearly the same experience of today.
Stacks
Bigger stacks are inevitable here, as is growth. Having one large allocation makes good sense here. Optimizations could exist to provide enough capacity for later calls to grow lower stacks (PGO). The main stack will be exceptionally tall as it will hold globals. Stacks will clearly be allocated "in the heap" but it can be a simple mutexed allocator atop malloc/free, with the last pop being a stack free. Stack height management may need some care if stacks are re-used and have a huge top that remains unused.
Unsafe/CGO
These will need to have their pointers update-able for stack rearranges, so the API will change significantly to include a 'moved' callback.
Performance
It's not uncommon to see 10% of CPU usage in the allocator and 15% in the garbage collector. Add mutex time and you can get a significant load. The allocator is inline, and the garbage collector stops the world for a short while. Comparing those against the stack growth would be the real measure. But the allocator gets simpler: "Do I need to grow the target stack frame because the grower is at capacity?
Maniacal stack-growth-in-a-loop can't really happen much because that lower stack would stay grown. Stack shrinkage may become a need, but most apps grow to a size and stay there when they're long-running (like a cache getting populated).
The text was updated successfully, but these errors were encountered: