Description
Once the work to allow any object as the "code" of a frame is done, we can take advantage of that to speed up creation of code objects from serialized data.
The idea is that the serialized data will consist of two parts:
- A sequence of immutable bytecode
- Supporting binary data.
Creation of the top-level (module) code object would be done as follows:
- Create a "module initializer" object, consisting of a pointer to the binary data and debug info like the name and filename.
- Create a frame, setting the "code" field to the module initializer and setting the instruction to point at the instructions.
- Start executing in the interpreter.
What are the advantages of this?
- Marshal is slow
- There is no need for a secondary interpreter (marshal)
- It allow partial deep-freezing, meaning that the names and consts arrays can be deep frozen without requiring that the code object is deep frozen. The resulting constant can be loaded with
LOAD_COMMON_CONST
. - It allows further improvements, e.g. we could skip creating a code object for the module, just creating them for functions.
- It decouples the pyc format from marshal, allowing them to be improved separately.
- Common objects can be shared very efficiently, by leaving them on the stack and using
COPY
instead ofMAKE_...
Creating the instruction sequence
We can create the instruction in much the same way as marshal serializes; recursively emitting code for sub-objects until the entire object is complete.
To do this will need some new instructions and a few new instrinsics.
New general purpose instructions:
- LOAD_COMMON_CONST Loads a constant from the global array containing
None
,True
, etc plus assorted common constants - LOAD_COMMON_NAME Like
LOAD_COMMON_NAME
but from an array of strings. - LOAD_INT Loads a small int
Insructions to create objects from binary data.
These instructions will create an object from the binary data, advancing the pointer.
- MAKE_FLOAT
- MAKE_STRING
- MAKE_LONG (we could build large ints from small ints, but that would be quadratic)
- MAKE_BYTES
- MAKE_CODE: Creates a code object from values on the stack (name, qualname, names, consts) and binary data
New instrinsic functions
- make_complex (2)
- make_frozenset (1)
We already have an instruction for making tuples.
The instruction sequence would finish with MAKE_CODE; RETURN_VALUE
returning the completed instruction on the stack.
Or, we could add another instruction, START_CODE
at the end to execute the code object and return the completed module.
Examples
Creation of the tuple (1, "a", 37.0, (2, "foo"))
LOAD_INT 1
LOAD_COMMON_NAME "a"
MAKE_FLOAT 37.0
LOAD_INT 2
MAKE_STRING "foo"
BUILD_TUPLE 2
BUILD_TUPLE 3
Creation of a code object would look like something like this:
(Code to create names tuple)
(Code to create consts tuple)
MAKE_STRING name
MAKE_STRING qualname
COPY n (filename will be shared for all code objects in module)
MAKE_CODE