Skip to content

Commit 8d0b26b

Browse files
committed
Finish up function multiversioning support
* Enable test function multiversioning on the CI We can't do too much cloning on the CI before hitting the timeout or memory limit... Also avoid turning on cloning on circle CI since we seem to be very close to the memory limit. * Add devdoc
1 parent 087e7ea commit 8d0b26b

File tree

3 files changed

+67
-0
lines changed

3 files changed

+67
-0
lines changed

.travis.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ before_install:
101101
export JULIA_CPU_CORES=2;
102102
export JULIA_TEST_MAXRSS_MB=600;
103103
TESTSTORUN="all --skip linalg/triangular subarray"; fi # TODO: re enable these if possible without timing out
104+
- echo "override JULIA_CPU_TARGET=generic;native" >> Make.user
104105
- git clone -q git://git.kitenet.net/moreutils
105106
script:
106107
- echo BUILDOPTS=$BUILDOPTS

contrib/windows/appveyor_build.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ else
5353
echo 'LIBBLAS = -L$(JULIAHOME)/usr/bin -lopenblas' >> Make.user
5454
echo 'LIBBLASNAME = libopenblas' >> Make.user
5555
fi
56+
echo "override JULIA_CPU_TARGET=generic;native" >> Make.user
5657

5758
# Set XC_HOST if in Cygwin or Linux
5859
case $(uname) in

doc/src/devdocs/sysimg.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,68 @@ and `force` set to `true`, one would execute:
3838
```
3939
julia build_sysimg.jl /tmp/sys core2 ~/userimg.jl --force
4040
```
41+
42+
## System image optimized for multiple microarchitectures
43+
44+
The system image can be compiled simultaneously for multiple CPU microarchitectures
45+
under the same instruction set architecture (ISA). Multiple versions of the same function
46+
may be created with minimum dispatch point inserted into shared functions
47+
in order to take advantage of different ISA extensions or other microarchitecture features.
48+
The version that offers the best performance will be selected automatically at runtime
49+
based on available features.
50+
51+
### Specifying multiple system image targets
52+
53+
Multi-microarch system image can be enabled by passing multiple targets
54+
during system image compilation. This can be done either with the `JULIA_CPU_TARGET` make option
55+
or with the `-C` command line option when running the compilation command manually.
56+
Multiple targets are separated by `;` in the option.
57+
The syntax for each target is a CPU name followed by multiple features separated by `,`.
58+
All features supported by LLVM is supported and a feature can be disabled with a `-` prefix.
59+
(`+` prefix is also allowed and ignored to be consistent with LLVM syntax).
60+
Additionally, two special features are supported to control the function cloning behavior.
61+
62+
1. `clone_all`
63+
64+
By default, only functions that are the most likely to benefit from
65+
the microarchitecture features will be cloned.
66+
When `clone_all` is specified for a target, however,
67+
**all** functions in the system image will be cloned for the target.
68+
The negative form `-clone_all` can be used to prevent the built-in
69+
heuristic from cloning all functions.
70+
71+
2. `base(<n>)`
72+
73+
Where `<n>` is a placeholder for a non-negative number (e.g. `base(0)`, `base(1)`).
74+
By default, a partially cloned (i.e. not `clone_all`) target will use functions
75+
from the default target (first one specified) if a function is not cloned.
76+
This behavior can be changed by specifying a different base with the `base(<n>)` option.
77+
The `n`th target (0-based) will be used as the base target instead of the default (`0`th) one.
78+
The base target has to be either `0` or another `clone_all` target.
79+
Specifying a non default `clone_all` target as the base target will cause an error.
80+
81+
### Implementation overview
82+
83+
This is a brief overview of different part involved in the implementation.
84+
See code comments for each components for more implementation details.
85+
86+
1. System image compilation
87+
88+
The parsing and cloning decision are done in `src/processor*`.
89+
We currently support cloning of function based on the present of loops, simd instructions,
90+
or other math operations (e.g. fastmath, fma, muladd).
91+
This information is passed on to `src/llvm-multiversioning.cpp` which does the actual cloning.
92+
In addition to doing the cloning and insert dispatch slots
93+
(see comments in `MultiVersioning::runOnModule` for how this is done),
94+
the pass also generates metadata so that the runtime can load and initialize the
95+
system image correctly.
96+
A detail description of the metadata is available in `src/processor.h`.
97+
98+
2. System image loading
99+
100+
The loading and initialization of the system image is done in `src/processor*` by
101+
parsing the metadata saved during system image generation.
102+
Host feature detection and selection decision are done in `src/processor_*.cpp`
103+
depending on the ISA. The target selection will prefer exact CPU name match,
104+
larger vector register size, and larget number of features.
105+
An overview of this process is in `src/processor.cpp`.

0 commit comments

Comments
 (0)