Description
In order to optimize tier 2 superblocks, we want to be able to use existing techniques and use straightforward reasoning.
Unfortunately this is impossible due to a number of Python language, and CPython VM, features that make optimization very difficult, if not impossible.
For example
def f(y):
x = y
...
One might presume that from ...
onward, x
and y
would refer to the same object, but that might not be the case if x
were modified in a debugger in another thread. This is extremely unlikely to occur, but it is possible.
We need a scheme that allows us to perform useful optimizations and retain correctness when these unlikely events occur.
The general idea
We assume no "unlikely" event can occur when optimizing, and optimize freely.
If such an event does occur, we throw away our optimizations.
Local events
Most classes and modules are not mutated during their lifetime, but some are.
We would like to optimize as if there were immutable, but retain correctness if they are mutated.
Throwing away all our optimizations whenever a class or module was mutated would be very expensive, so we need a compromise.
What we can do, is track which executors depend on a given class or module and only throw those away.
The events
Global events
- Setting an object's class,
obj.__class__ = ...
- Setting the bases or mro of a class,
cls.__bases__ = ...
- Setting the PEP 523 frame eval function,
_PyInterpreterState_SetEvalFrameFunc()
. - Changing a module's dictionary,
mod.__dict__ = ...
- Function modification, setting the defaults, closure, etc.
func.__closure__ = (None, None)
- Modification of locals in a debugger
- staticmethod, classmethod and property can be mutated by calling
__init__
again
Local events
- Setting a class attribute,
cls.attr = ...
- Setting a global variable,
global x; x = ...
- Instrumentation,
sys.monitoring.set_events()
The above classification is not fixed, there is no hard rules about what is global, or locals.
Globals events are cheaper to optimize, but more expensive to de-optimize, so they must be very rare.
Handling de-optimization and re-optimization
If we naively throw away all executors after a global event, and then carry on as normal, we could easily find ourselves optimizing, then de-optimizing repeatedly with a dire effects on performance.
So we need to avoid that. For each event we need a way to avoid that.
First of all we keep a global (per-interpreter) set of flags to indicate how many times the event has occurred.
If an event has occurred N
or more times we change our optimization strategy to assume the event may happen, and do not de-optimize if it does.
Event | N | Note |
---|---|---|
obj.__class__ = ... |
1 | |
cls.__bases__ = ... |
1 | |
PEP 523 | 1 | |
Modifying a builtin | 4 | (builtins can get modified during startup, so we need to tolerate that) |
Function modification | 1 | |
Setting a local in debugger | infinite | (this is a slow event, so the cost of re-optimization is acceptable) |
cls.attr = ... |
8 | per-class |
Setting a global | 8 | per-module |
Instrumentation | infinite | (instrumentation is set infrequently and prevents optimization locally, so we want to optimize around it) |
staticmethod/classmethod/property mutation | 1 |
The values for N
above are provisional and are likely to change.
Free threading
To support free-threading, most of the events above will be stop-the-world events.
Once the modification threshold is reached, then those events will no longer stop the world, as the event will no longer invalidate optimizations.