Description
This is technically not a bug/crash reported or a feature request. This is more of an FYI for package maintainers and an invitation to discuss the potential issue.
To give you some context, I'm one of the maintainers of the mypy feedstock at conda-forge. During the py310
build migration for mypy
, our CI build container kept getting killed/OOMed on linux-aarch64
. On investigation, I found that the memory requirements to build the __native_<>.c module
has increased (significantly?) for py310
.
/usr/bin/time -v gcc -pthread -B $CONDA_PREFIX/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -fPIC -O2 -isystem $CONDA_PREFIX/include -fPIC -O2 -isystem $CONDA_PREFIX/include -fPIC -I/path/to/mypy/mypyc/lib-rt -Ibuild -I$CONDA_PREFIX/include/python3.10 -c build/__native_a64f60bb4641b67cb20c.c -o build/temp.linux-x86_64-3.10/build/__native_a64f60bb4641b67cb20c.o -O3 -Werror -Wno-unused-function -Wno-unused-label -Wno-unreachable-code -Wno-unused-variable -Wno-unused-command-line-argument -Wno-unknown-warning-option -Wno-unused-but-set-variable
build/__native_a64f60bb4641b67cb20c.c: In function ‘CPyDef_fixup___TypeFixer___visit_instance’:
build/__native_a64f60bb4641b67cb20c.c:570654: note: ‘-Wmisleading-indentation’ is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
570654 | CPyL13: ;
|
Command being timed: "gcc -pthread -B $CONDA_PREFIX/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -fPIC -O2 -isystem $CONDA_PREFIX/include -fPIC -O2 -isystem $CONDA_PREFIX/include -fPIC -I/path/to/mypy/mypyc/lib-rt -Ibuild -I$CONDA_PREFIX/include/python3.10 -c build/__native_a64f60bb4641b67cb20c.c -o build/temp.linux-x86_64-3.10/build/__native_a64f60bb4641b67cb20c.o -O3 -Werror -Wno-unused-function -Wno-unused-label -Wno-unreachable-code -Wno-unused-variable -Wno-unused-command-line-argument -Wno-unknown-warning-option -Wno-unused-but-set-variable"
User time (seconds): 624.63
System time (seconds): 5.31
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 10:30.21
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): **5037436** <--- !!!!
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 20
Minor (reclaiming a frame) page faults: 2640454
Voluntary context switches: 191
Involuntary context switches: 1676
Swaps: 0
File system inputs: 2792
File system outputs: 1818944
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
To compare, with py38
, the requirements were:
/usr/bin/time -v gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/path/to/mypy/mypyc/lib-rt -Ibuild -I$CONDA_PREFIX/include/python3.8 -c build/__native_a64f60bb4641b67cb20c.c -o build/temp.linux-x86_64-3.8/build/__native_a64f60bb4641b67cb20c.o -O3 -Werror -Wno-unused-function -Wno-unused-label -Wno-unreachable-code -Wno-unused-variable -Wno-unused-command-line-argument -Wno-unknown-warning-option -Wno-unused-but-set-variable
build/__native_a64f60bb4641b67cb20c.c: In function ‘CPyDef_fixup___TypeFixer___visit_none_type__TypeVisitor_glue’:
build/__native_a64f60bb4641b67cb20c.c:570413: note: ‘-Wmisleading-indentation’ is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
570413 | CPyL1: ;
|
Command being timed: "gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/path/to/mypy/mypyc/lib-rt -Ibuild -I$CONDA_PREFIX/include/python3.8 -c build/__native_a64f60bb4641b67cb20c.c -o build/temp.linux-x86_64-3.8/build/__native_a64f60bb4641b67cb20c.o -O3 -Werror -Wno-unused-function -Wno-unused-label -Wno-unreachable-code -Wno-unused-variable -Wno-unused-command-line-argument -Wno-unknown-warning-option -Wno-unused-but-set-variable"
User time (seconds): 595.81
System time (seconds): 5.07
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 10:01.20
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): **4817776** <--- !!!!
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2593657
Voluntary context switches: 153
Involuntary context switches: 3692
Swaps: 0
File system inputs: 24
File system outputs: 1723584
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
By decreasing the amount of debug info being injected using -g1
, I was able reduce the amount of maximum RSS
by ~1GB
for py310
and ~700MB
for py38
and then CI was happy. Particularly for release builds, one may wish to simply use -g0
as well.
It might be interesting to see why we see these increases with cpython version bumps. I would ❤️ to get some insight by the mypy devs 🙂