Skip to content

List of ideas #4

@gvanrossum

Description

@gvanrossum

Here's a large list of ideas that we could pursue. Eventually we should have a separate issue to discuss each individual idea -- if you feel the urge to comment on a particular idea, please open a new issue and link that issue back here. Editing this list in-place to reorganize it is okay though.

Ideas for the bytecode interpreter itself

(In bold the most viable ideas.)

  • Bytecode specialization for specific types, with a guard and fallback bytecode if a different type is received (A disassembler #8)
  • Focus on numeric operations (or even integer operations) and work on keeping intermediate values unboxed
  • Add more inline caching, and optimize existing inline caching
  • Introduce hidden classes (Hidden classes #6)
  • Instead of checking for signals and other async events on (nearly) every bytecode (see ‘eval_breaker’), insert specific "check events" bytecodes at back arcs in the code (bpo-29988: Only check evalbreaker after calls and on backwards egdes.  python/cpython#18334)
  • Improve method lookup (currently this is a series of dict lookups)
  • Inline simple function calls (useful primarily because it allows other optimizations to work across calls)
  • Tagged integers. The big problem here is that we can't store these in data structures like tuples, because we don't want extensions to stumble upon them. But we could store these on the stack and in locals, and e.g. box on STORE_ATTR, STORE_SUBSCR or STORE_GLOBAL (Pointer tagging aka tagged numbers #7)
  • Unboxed tuples. Especially when a tuple is returned from a function this could be useful, but it would require a change to the calling sequence
  • Try to eliminate INCREF/DECREF operations
  • Switch to a register-based (instead of stack-based) VM; this would be a big project (Register-based VM #22)
  • Generating code for the ceval loop/switch (Generating code for the ceval loop/switch #5)

Ideas for other parts of the interpreter

  • Improve GC. There is little low-hanging fruit here, but some folks in the core dev team (e.g. Victor Stinner) are working on warming the community up to a new version of the C API that is more GC friendly (e.g. HPY or perhaps just deprecating a few dozen functions that take or return borrowed references). I also recall Neil Schemenauer and a few others explored this in some depth at the most recent core dev sprint
  • Related: move GC to a separate thread (Move GC to a separate thread #58)
  • Also related: keep refcounts in a separate page (IIRC Larry Hastings did this for the Gilectomy, and it was non-trivial)
  • Improve object layout. Some aspects of this are hidden but affect performance, e.g. many objects are part of a linked list used by the GC, but this list could be moved to a separate data structure, making the objects themselves smaller, thus improving memory access locality. [Mark Shannon]
  • Move the __dict__ attribute to a fixed offset, perhaps before the rest of the object, with a flag in the object itself to indicate its presence. This would reduce pointer chasing (currently you have to call a function to ask the type object to calculate the offset, which may be even variable across instances of the same type, e.g. class C(tuple): …). [Mark Shannon]
  • Change the dict implementation so that the values may be slots in an instance while the keys and hash table are part of the class. (Details are a little vague on this one. I think it is similar to our current "shared keys dict" variant, but with more emphasis of using the offsets instead of the keys in the bytecode, assuming the guard succeeds.) [Curtis Man or Ed Maurer]
  • Improvements to call frame management. Most low-hanging fruit was picked long ago, but there are some intriguing ideas, such as putting these in an array instead of a linked list (for memory locality) -- probably we need a linked list of fixed-size arrays, but that would still help.
  • Make imports partial and lazy. (Neil Schemenauer)
  • Add tooling for full-program static analysis (especially to support optimization and even machine-code compilation).
  • Improve Python runtime startup time through more frozen modules.
  • Add truly read-only object graphs, allowing optimization of objects in the graph and of operations on them, as as memory-related benefits.
  • Pursue compile-time and run-time strategies for improving memory locality of related objects.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions