[NativeAOT] Link-time-optimize unmanaged portions of the runtime on Linux #86083
Description
We currently compile unmanaged portions of the runtime without LTO or PGO because the runtime is placed in an .a file that gets linked using an unknown linker that exists on the user machine. LTO requires a linker that knows how to interpret the bitcode in non-ELF object files.
We can however apply LTO on .a files. See David's prototype at https://gist.github.com/davidwrighton/385035ffd24b88c39c2e7d5cf0274907.
How to use from David:
First, compile FileA.o and FileB.o which were compiled via command lines like:
clang -O3 -flto -c FileB.c
clang -O3 -flto -c FileA.cThose commands produce FileA.o, and FileB.o which are NOT ELF object files, but instead are the Bitcode file format.
Then run a command line like…
LtoOptimize --plugin /usr/lib/llvm-14/lib/libLTO.so -o FileA.o -o FileB.o -O Optimized.o --symbol=CanOnlyAlwaysReturn2WithLTOand produce an ELF file Optimized.o that has a function CanOnlyAlwaysReturn2WithLTO which was optimized in a manner which requires LTO to produce optimal output.
I also added the ability to dump the set of symbol names in FileA and FileB, as well as the ability to compile ALL symbols from both FileA and FileB.
So the theory is that we could:
- Build the unmanaged portion of the runtime with LTO enabled.
- Run a tool similar to the one from the gist to perform LTO on the library and produce an optimized .o file
- Pack the .o back into an .a (now .a like any other, with no bitcode)
- Profit
We'd need to measure if this is indeed profitable and worth the engineering costs. Success not guaranteed. Might be better to first just enable LTO locally and use a linker that can handle it E2E (i.e. turn on LTO and compile with ILC as usual, expecting the linker step to do the LTO) and get some measurements. GC perf would be the most interesting to measure, so do something that stresses the GC and measure with/without LTO.
Metadata
Assignees
Type
Projects
Status
No status
Activity