Skip to content

Top-of-stack caching in the JIT #135379

Open
@markshannon

Description

@markshannon

Unlike #131498 which was a wash for performance, TOS caching in the JIT promises substantial performance improvements. This is because we can create several stencils for each uop, tailored for the number of registers and dynamically vary the number of values cached.
For example, in this code:

  LOAD_FAST_BORROW
  LOAD_FAST_BORROW
  BINARY_OP_ADD_INT
  STORE_FAST

we can tailor each version to the number of registers cached:

  LOAD_FAST_BORROW_0_1  ( 0 -> 1 registers)
  LOAD_FAST_BORROW_1_2  ( 1 -> 2 registers)
  BINARY_OP_ADD_INT_2_1 ( 2 -> 1 registers)
  STORE_FAST_1_0        ( 1 -> 0 registers)

thus avoiding any memory traffic to and from the stack at all.

The exact number of variants per uop will need to be determined empirically.
Having more stencils allows more freedom when generating code, but excessive numbers of stencils would cause bloat both at runtime, and in any repository containing the stencils.

Spilling and reloading

There will be an upper bound to the number of values cached and some uops may need a minimum number of values in the cache.
To handle those we will need to insert spill and reload uops. Spills will reduce the number of cached values, saving them to the in-memory stack and reloads will do the opposite moving values from the in-memory stack to the cache.

E.g.

  LOAD_FAST_BORROW
  BINARY_OP_ADD_INT

BINARY_OP_ADD_INT expects two inputs, but we only have one cached (from the LOAD_FAST_BORROW) so we need to insert a RELOAD:

  LOAD_FAST_BORROW  ( 0 -> 1 registers)
  RELOAD_1_2        ( 1 -> 2 registers)
  BINARY_OP_ADD_INT ( 2 -> 1 registers)

SPILL and RELOAD are semantic no-ops, and will be generated automatically.

Deferred references

For this to work the code generator must spill any cached values to the in-memory stack when GC could occur. Fortunately, the code generator already does this (as part of the work for #131498).

See faster-cpython/ideas#711

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetopic-JITtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions