Description
compile.c is one of the largest source files in python (at about 8000 lines). Most of the code is code generation functions (the AST traversals). Most of the complexity is in the compiler data structure management.
In this work I will split out the code generation functions into a separate file so that they are independent of the compiler internals (access the compiler through an opaque reference and a well defined API). First I will transform the codegen functions so that they no longer access internals of struct compiler
or struct compiler_unit
. Then I will rename compile.c to codegen.c (to preserve commit history for the codegen functions) and copy out the compiler implementation code to a new file (compile.c, so the entry point is where people are used to finding it).
The benefits will be that (1) codegen code is simpler (2) the complexity of compile.c is more manageable as it will probably only span 2-3K lines of code (3) it becomes possible to share parts of the compiler with alternative compiler implementation.
Linked PRs
- gh-121404: remove direct accesses to u_private from codegen functions #121500
- gh-121404: avoid accessing compiler internals in codegen functions #121538
- gh-121404: move calculation of module start location from compiler_body up to compiler_codegen #122127
- gh-121404: extract compiler_lookup_arg out of compiler_make_closure #122181
- gh-121404: split compiler's push/pop_inlined_comprehension_state into codegen and compiler parts #123021
- gh-121404: rename compiler_addop* to codegen_addop*, and replace direct usages by the macros #123043
- gh-121404: pass metadata to compiler_enter_scope (optionally) so that codegen functions don't need to set it on the code unit #123078
- gh-121404: rename functions to use codegen_* prefix. Use macros more consistently. #123139
- gh-121404: split fblock handling into compiler_* and codegen_* parts #123199
- gh-121404: remove redundant c_nestlevel. more compiler abstractions. more macro usage consistency #123225
- gh-121404: compiler_annassign --> codegen_annassign #123245
- gh-121404: more compiler_* -> codegen_*, class_body and comprehensions #123262
- gh-121404: compiler_visit_* --> codegen_visit_* #123382
- gh-121404: split compiler_nameop into a codegen part and a compiler part #123398
- gh-121404: rearrange code in compile.c so that codegen functions come first and compiler functions second #123510
- gh-121404: enforce that codegen doesn't access compiler, and compiler doesn't use codegen macros #123575
- gh-121404: split compile.c into compile.c and codegen.c #123651
- gh-121404: update CODEOWNERS #124109
- gh-121404: typo fix in compile.c: MATADATA -> METADATA #125101