Skip to content

Incremental cycle GC #613

Closed
Closed
@markshannon

Description

@markshannon

The cycle GC takes about 10% of the runtime on our benchmark suite.
This is already far too high, and with NoGIL it will get much worse.

Currently, we perform an entire heap scan every N collections (currently N == 100).
Scanning the entire heap is bad for throughput and even worse for latency.

So instead of scanning the entire heap at once, we should scan the old generation incrementally.

Here's how it could work (this untested and unproven)

  • Two generations: young and old.
  • Split the old generations into two spaces: pending and scanned.
  • Perform alternating young and old collections*
  • Survivors of the young collection are moved to the end of start of the scanned section
  • The old collection works as follows:
    objects_per_collection = 10_000 # This will need to adapt to rate at which objects survive the young collection
    if not pending:
        pending, scanned = scanned, pending
    if not pending:
        return
    work = [ pending.popleft() ]
    objects_per_collection -= 1
    while True:
        for obj in work:  # Do not check for mutation, as we want to mutate work.
            for child in obj.tp_visit:
                if child in scanned:
                    continue
                DECREF(child)
                if child not in work:
                    work.append(child)
                    objects_per_collection -= 1
        if not pending or objects_per_collection <= 0:
            break
        work.append(pending.popleft())
        objects_per_collection -= 1
    reachable = clear_cycles_and_restore_refcounts(work)
    scanned.extend(reachable)

clear_cycles_and_restore_refcounts(work) is already part of the cycle GC, so we aren't really adding much complexity.

The above algorithm has (I believe) two key properties:

  • It is guaranteed to collect all cycles enventually
  • It visits no more objects than the current algorithm: In the worse case of the entire object graph being reachable from the first object looked at, then all objects are visited, but the current old space collection does that anyway.

[*] Instead of doing an incremental collection every 2 collections, we could do it every N collections. If the incremental collections visit too many objects, i.e. much more than work_per_collection, we can increase N.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions