Break calls into two or three bytecodes for better specialization. #210
Replies: 5 comments 1 reply
-
As a motivating example, consider calls to a normal Python class (by "normal" I mean one that inherits from object and does not override My initial attempt to do this is both complex and broken, see python/cpython#30415 By splitting the Likewise, |
Beta Was this translation helpful? Give feedback.
-
Attempting to implement the above scheme, it seems like breaking the
The main benefit of merging Internally the interpreter would have several variables (probably in a struct) to describe the "shape" of the call. To outline how this would work, take the example
Cost of adding
|
Beta Was this translation helpful? Give feedback.
-
A bit more detail: There will be two For example
and
|
Beta Was this translation helpful? Give feedback.
-
bpo issue: https://bugs.python.org/issue46329 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We already do this partly with the
PRECALL_METHOD
instruction which is inserted before theCALL_[NO_]KW
to setup the correct argument offsets.I propose adding a
PRECALL_FUNCTION
instruction to be inserted for the non-method case.This would, in the non-specialized case, setup argument offsets for a non-method call.
The motivation for this change is to split the specialization of calls into two parts.
Currently we have a sort of
N*M
problem with specializing calls. We want to specialize for both the type of callable (builtin-function, Python function, bound-method, builtin-class, Python class) and for the argument handling (simple arguments, *args, **kwargs, defaults, etc.)By splitting the call sequence into two we need N+M specializations instead. Since N is at least 5 and M is at least 3, this saves a lot of duplication. Instructions specialized for both type and shape are larger than specializations for just one, so we end up with
(N*M*2)
vs.(N+M)
.The obvious downside of this is the increased memory use for code objects. The extra 2 bytes for the
PRECALL_FUNCTION
is not an issue, although the additional location information may be. The extra 8 or 16 bytes for the cache is significant, but should be tolerable.Beta Was this translation helpful? Give feedback.
All reactions