@@ -38,3 +38,68 @@ and `force` set to `true`, one would execute:
3838```
3939julia build_sysimg.jl /tmp/sys core2 ~/userimg.jl --force
4040```
41+
42+ ## System image optimized for multiple microarchitectures
43+
44+ The system image can be compiled simultaneously for multiple CPU microarchitectures
45+ under the same instruction set architecture (ISA). Multiple versions of the same function
46+ may be created with minimum dispatch point inserted into shared functions
47+ in order to take advantage of different ISA extensions or other microarchitecture features.
48+ The version that offers the best performance will be selected automatically at runtime
49+ based on available features.
50+
51+ ### Specifying multiple system image targets
52+
53+ Multi-microarch system image can be enabled by passing multiple targets
54+ during system image compilation. This can be done either with the ` JULIA_CPU_TARGET ` make option
55+ or with the ` -C ` command line option when running the compilation command manually.
56+ Multiple targets are separated by ` ; ` in the option.
57+ The syntax for each target is a CPU name followed by multiple features separated by ` , ` .
58+ All features supported by LLVM is supported and a feature can be disabled with a ` - ` prefix.
59+ (` + ` prefix is also allowed and ignored to be consistent with LLVM syntax).
60+ Additionally, two special features are supported to control the function cloning behavior.
61+
62+ 1 . ` clone_all `
63+
64+ By default, only functions that are the most likely to benefit from
65+ the microarchitecture features will be cloned.
66+ When ` clone_all ` is specified for a target, however,
67+ ** all** functions in the system image will be cloned for the target.
68+ The negative form ` -clone_all ` can be used to prevent the built-in
69+ heuristic from cloning all functions.
70+
71+ 2 . ` base(<n>) `
72+
73+ Where ` <n> ` is a placeholder for a non-negative number (e.g. ` base(0) ` , ` base(1) ` ).
74+ By default, a partially cloned (i.e. not ` clone_all ` ) target will use functions
75+ from the default target (first one specified) if a function is not cloned.
76+ This behavior can be changed by specifying a different base with the ` base(<n>) ` option.
77+ The ` n ` th target (0-based) will be used as the base target instead of the default (` 0 ` th) one.
78+ The base target has to be either ` 0 ` or another ` clone_all ` target.
79+ Specifying a non default ` clone_all ` target as the base target will cause an error.
80+
81+ ### Implementation overview
82+
83+ This is a brief overview of different part involved in the implementation.
84+ See code comments for each components for more implementation details.
85+
86+ 1 . System image compilation
87+
88+ The parsing and cloning decision are done in ` src/processor* ` .
89+ We currently support cloning of function based on the present of loops, simd instructions,
90+ or other math operations (e.g. fastmath, fma, muladd).
91+ This information is passed on to ` src/llvm-multiversioning.cpp ` which does the actual cloning.
92+ In addition to doing the cloning and insert dispatch slots
93+ (see comments in ` MultiVersioning::runOnModule ` for how this is done),
94+ the pass also generates metadata so that the runtime can load and initialize the
95+ system image correctly.
96+ A detail description of the metadata is available in ` src/processor.h ` .
97+
98+ 2 . System image loading
99+
100+ The loading and initialization of the system image is done in ` src/processor* ` by
101+ parsing the metadata saved during system image generation.
102+ Host feature detection and selection decision are done in ` src/processor_*.cpp `
103+ depending on the ISA. The target selection will prefer exact CPU name match,
104+ larger vector register size, and larget number of features.
105+ An overview of this process is in ` src/processor.cpp ` .
0 commit comments