Description
Feature or enhancement
Proposal:
Approximately 80% of reference count operations occur in the interpreter. stats
The vast majority of these operations are needed only to maintain the correct reference counts for references in local variables and the evaluation stack.
We should not count these references, instead deferring them until we wish to perform incremental collection.
Doing so will give us a reasonable speedup on default builds, but the real value is for free-threading.
Free-threading requires that some references on the frame stack are deferred, but tagging those references is expensive. It is much more efficient to simply deferred all references on the frame stack.
This is not a new idea, in fact it is a very old one.
The implementation is conceptually fairly simple:
- We don't count references in local variables and the evaluation stack
- Any object that has a reference count of zero is, instead of being reclaimed, added to a "Zero count table"
- When we perform collection, we update the reference count of all objects that have references on the stack, collect any objects with a zero reference count, and then reset the reference counts.
Like many "simple" ideas, the devil is in the detail.
There are two main concerns:
- Reclamation of objects is not as prompt as before. We may use more resources, waiting for them to be reclaimed.
- The overhead of updating references during collections may be as great or greater than the saving by deferring the reference counting.
We can keep reclamation acceptably prompt, by tracking the size of objects in the Zero Count Table, the size of objects allocated.
Once this number gets large enough, we perform a collection at the next opportunity.
We can keep the overhead of updating the reference counts low, by only deferring the top of the stack. Parts of the stack that are not accessed between collections, can be counted eagerly, reducing the amount of scanning needed to a few frames.
Previous discussion
Linked PRs
- GH-120024: Add verification tool to check for unmarked escaping calls #121917
- GH-120024: Use pointer for stack pointer #121923
- GH-120024: Refactor code a bit so that escaping calls can be wrapped in spill code in code generator #122693
- GH-120024: Refactor code generators to uses classes for emitting code. #122730
- GH-120024: Move three more escaping calls out of conditional statements #122734
- GH-120024: Tidy up case generator code a bit. #122780
- GH-120024: Remove
CHECK_EVAL_BREAKER
macro. #122968 - GH-120024: Tidy up pycore_stackref.h, splitting into GIL and free-threading sections #125095
- GH-120024: Change
LOAD_FAST
instructions toLOAD_FAST_TEMP
when it is guaranteed the reference is consumed immediately. #125192
### Tasks
- [ ] https://github.com/python/cpython/issues/123391
- [x] Interpreter code generators need to be able to flush the stack around escaping calls.