Skip to content

[SYCL][Doc] Update compiler options descriptions #3340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 18, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
261 changes: 138 additions & 123 deletions sycl/doc/UsersManual.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,125 @@
# Users Manual

The DPC++ Compiler contains many options to generate the desired binaries for
your application.
This is the list of SYCL specific options supported by compiler and some
examples.

## SYCL specific command line options
Options marked as [DEPRECATED] are going to be removed in some future updates.
Options marked as [EXPERIMENTAL] are expected to be used only in limitted cases
and not recommended to use in production environment.

## Generic options

**`-fsycl`**

General enabling option for SYCL compilation mode. This option enables
offload compilation for a given target using the `-fsycl-targets` option.
When the `-fsycl-targets` option is not provided, the default triple is
`spir64-unknown-unknown-sycldevice`.
General enabling option for SYCL compilation and linking mode. List of
targets can be specified with `-fsycl-targets`. This is fundamental option
for any SYCL compilation. All other SYCL specific options require it.

**`-fsycl-targets=<T1>[,...,<Tn>]`**

Enables ahead of time (AOT) compilation for specified device targets. T is
a compiler target triple string, representing a target device architecture.
You can specify more than one target, comma separated. Default just in time
(JIT) compilation target can be added to the list to produce a combination
of AOT and JIT code in the resulting fat binary.
The following triples are supported by default:
* spir64-unknown-unknown-sycldevice - this is the default generic SPIR-V
target;
* spir64_x86_64-unknown-unknown-sycldevice - generate code ahead of time
for x86_64 CPUs;
* spir64_fpga-unknown-unknown-sycldevice - generate code ahead of time for
Intel FPGA;
* spir64_gen-unknown-unknown-sycldevice - generate code ahead of time for
Intel Processor Graphics;
Available in special build configuration:
* nvptx64-nvidia-cuda-sycldevice - generate code ahead of time for CUDA
target;

## Language options

**`-sycl-std=<value>`** [EXPERIMENTAL]

**`-fsycl-targets=<value>`**
SYCL language standard to compile for. Possible values:
* 121 - SYCL 1.2.1
* 2020 - SYCL 2020
It doesn't guarantee specific standard compliance, but some selected
compiler features change behavior.
It is under development and not recommended to use in production
environment.
Default value is 2020.

A comma separated list of triples to specify the device target(s) to
generate code for. This option is only valid when used with `-fsycl`.
**`-f[no-]sycl-unnamed-lambda`**

### Target toolchain options.
Enables/Disables unnamed SYCL lambda kernels support.
Disabled by default.

**`-Xsycl-target-backend=<triple> <arg>`**
## Optimization options

Pass <arg> to the SYCL based backend identified by <triple>.
**`-f[no-]sycl-early-optimizations`**

**`-Xsycl-target-backend <arg>`**
Enables (or disables) intermediate representation optimization pipeline
before translation to SPIR-V. Have effect only if optimizations are turned
on by standard compiler options (-O1 or higher).
Enabled by default.

Pass <arg> to the SYCL based target backend.
**`-f[no-]sycl-dead-args-optimization`**

**`-Xsycl-target-frontend=<triple> <arg>`**
Enables (or disables) LLVM IR dead argument elimination pass to remove
unused arguments for the kernel functions before translation to SPIR-V.
Currently has effect only on spir64\* targets.
Disabled by default.

Pass <arg> to the SYCL based target frontend identified by <triple>.
**`-f[no-]sycl-id-queries-fit-in-int`**

**`-Xsycl-target-frontend <arg>`**
Assume/Do not assume that SYCL ID queries fit within MAX_INT. It assumes
that these values fit within MAX_INT:
* id class get() member function and operator[]
* item class get_id() member function and operator[]
* nd_item class get_global_id()/get_global_linear_id() member functions
Enabled by default.

Pass <arg> to the SYCL based target frontend.
## Target toolchain options

**`-Xsycl-target-linker=<triple> <arg>`**
**`-Xsycl-target-backend=<T> "options"`**
**`-Xs "options"`**

Pass <arg> to the SYCL based target linker identified by <triple>.
Pass "options" to the backend of target device compiler, specified by
triple T. The backend of device compiler generates target machine code from
intermediate representation. This option can be used to tune code
generation for a specific target. The "options" are used during AOT
compilation. For JIT compilation "options" are saved in a fat binary and
used when code is JITed during runtime.
-Xs is a shortcut to pass "options" to all backends specified via the
'-fsycl-targets' option (or default one).

**`-Xsycl-target-linker <arg>`**
**`-Xsycl-target-frontend=<T> "options"`**

Pass <arg> to the SYCL based target linker.
Pass "options" to the frontend of target device compiler, specified by
triple T. This option can be used to control of intermediate representation
generation during offline or online compilation.

### Link options
**`-Xsycl-target-linker=<T> "options"`**

Pass "options" to the device code linker, when linking multiple device
object modules. T is specific target device triple.

## Link options

**`-fsycl-link`**

Generate partially linked device object to be used with the host link.
Link device object modules and wrap those into a host-compatible object
module that can be linked later by any standard host linker into the final
fat binary.

**`-fsycl-link-targets=<T1,...,Tn>`**
**`-fsycl-link-targets=<T1,...,Tn>`** [DEPRECATED]

Specify comma-separated list of triples SYCL offloading targets to produce
linked device images. Used in a link step to link device code for given
targets and output multiple linked device code images, whose names consist
of the common prefix taken from the -o option and the triple string.
Does not produce fat binary and must be used together with -fsycl.

**`-fsycl-add-targets=<T1:file1...Tn:filen>`**
**`-fsycl-add-targets=<T1:file1...Tn:filen>`** [DEPRECATED]

Add arbitrary device images to the fat binary being linked

Expand All @@ -68,7 +130,7 @@ your application.
image for the target triple it is paired with, and offload bundler is
invoked to do the actual bundling.

**`-foffload-static-lib=<lib>`**
**`-foffload-static-lib=<lib>`** [DEPRECATED]

Link with fat static library.

Expand All @@ -81,95 +143,81 @@ your application.
specified with `-foffload-static-lib` are treated as host libraries and are
only used during the final host link.

**`-foffload-whole-static-lib=<lib>`**
**`-foffload-whole-static-lib=<lib>`** [DEPRECATED]

Similar to `-foffload-static-lib` but uses the whole archive when
performing the device code extraction. This is helpful when creating
shared objects from fat static archives.

**`-fsycl-device-code-split=<value>`**
**`-fsycl-device-code-split=<mode>`**

Perform SYCL device code split. There are three possible values for this
option:
- per_kernel - a separate device code module is created for each SYCL
kernel. Each device code module will contain a kernel and all its
dependencies, i.e. called functions and used variables.
- per_source - a separate device code module is created for each source
(translation unit). Each device code module will contain a bunch of kernels
grouped on per-source basis and all their dependencies, i.e. all used
variables and called functions, including the `SYCL_EXTERNAL` macro-marked
functions from other translation units.
- off - no device code split.
NOTE: By default device code split is 'off' - all kernels go into a
single module.
Specifies SYCL device code module assembly. Mode is one of the following:
* per_kernel - creates a separate device code module for each SYCL kernel.
Each device code module will contain a kernel and all its dependencies,
such as called functions and used variables.
* per_source - creates a separate device code module for each source
(translation unit). Each device code module will contain a bunch of
kernels grouped on per-source basis and all their dependencies, such as
all used variables and called functions, including the `SYCL_EXTERNAL`
macro-marked functions from other translation units.
* off - creates a single module for all kernels.
* auto - the compiler will use a heuristic to select the best way of
splitting device code. This is default mode.

**`-fsycl-device-code-split`**
**`-f[no-]sycl-device-lib=<lib1>[,<lib2>,...]`**

Perform SYCL device code split in the per_source mode, i.e. create a
separate device code module for each source (translation unit).
Enables/disables linking of the device libraries. Supported libraries:
libm-fp32, libm-fp64, libc, all. Use of 'all' will enable/disable all of
the device libraries.

### Intel FPGA specific options
## Intel FPGA specific options

**`-fintelfpga`**

Perform ahead of time compilation for Intel FPGA, which relies on the
external tool `aoc` being available in the `PATH`.

This option is roughly equivalent to
`-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice -g -MMD -lOpenCL`.

It is incompatible with `-fsycl-targets=...`; if ahead of time compilation
is needed for multiple backends (e.g. Intel FPGA, Intel GPU, etc.), the
alternative form based on
`-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice` should be used
instead.
Perform ahead of time compilation for Intel FPGA. It sets the target to
FPGA and turns on the debug options that are needed to generate FPGA
reports. It is functionally equivalent shortcut to
`-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice -g -MMD` on Linux
and `-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice -Zi -MMD` on
Windows.

**`-fsycl-link=<value>`**
**`-fsycl-link=<output>`**

Generate partially linked device and host object to be used at various
stages of compilation. Takes the device binary(s) generated from a `-fsycl`
enabled compilation and wrap to create a host linkable object. This option
is enabled only in ahead of time compilation mode fore FPGA (i.e. when
`-fintelfpga` is set).
Controls FPGA target binary output format. Same as -fsycl-link, but
optional output can be one of the following:
* early - generate html reports and an intermediate object file that avoids
a full Quartus compile. Usually takes minutes to generate. Link can later
be resumed from this point using -fsycl-link=image.
* image - generate a bitstream which is ready to be linked and used on a
FPGA board. Usually takes hours to generate.

**`-reuse-exe=<exe>`**

Speed up FPGA aoc compile if the device code in <exe> is unchanged.
Speed up FPGA backend compilation if the device code in <binary> is
unchanged. If it's safe to do so the compiler will re-use the device binary
embedded within it. This can be used to minimize or avoid long Quartus
compile times for FPGA targets when the device code is unchanged.

### Other options
## Other options

**`-fsycl-device-only`**

Compile only SYCL device code.
Compile only device part of the code and ignore host part.

**`-fsycl-use-bitcode`**
**`-f[no-]sycl-use-bitcode`** [EXPERIMENTAL]

Emit SYCL device code in LLVM-IR bitcode format. When disabled, SPIR-V is
emitted. Default is true.

**`-fno-sycl-use-bitcode`**

Use SPIR-V instead of LLVM bitcode in fat objects.

**`-sycl-std=<value>`**

SYCL language standard to compile for.

**`-fsycl-help`**

Emit help information from all of the offline compilation tools.

**`-fsycl-help=<value>`**

Emit help information from the offline compilation tool associated with the
given architecture argument. Supported architectures: `x86_64`, `fpga` and
`gen`.
emitted.
Enabled by default.

**`-fsycl-unnamed-lambda`**
**`-fsycl-help[=backend]`**

Allow unnamed SYCL lambda kernels.
Emit help information from device compiler backend. Backend can be one of
the following: "x86_64", "fpga", "gen", or "all". Specifying "all" is the
same as specifying -fsycl-help with no argument and emits help for all
backends.

## SYCL device code compilation
# Example: SYCL device code compilation

To invoke SYCL device compiler set `-fsycl-device-only` flag.

Expand All @@ -184,36 +232,3 @@ By default the output format for SYCL device is LLVM bytecode.
```console
$ clang++ -fsycl-device-only -fno-sycl-use-bitcode sycl-app.cpp -o sycl-app.spv
```

## Static archives with SYCL device code

The DPC++ Compiler contains support to create and use static archives that
contain device enabled fat objects.

### Build your objects

```console
$ clang++ -fsycl sycl-app1.cpp sycl-app2.cpp -c
```

### Create the static archive

Build the static archive in the same manner as you would any other normal
static archive, using the objects that were created using the above step.

```console
$ ar cr libsyclapp.a sycl-app1.o sycl-app2.o
```

### Use the static archive

Once you have created the archive, you can use it when creating your final
application. The fat archives are treated differently than a regular archive
so the option `-foffload-static-lib` is used to signify the needed behavior.

```console
$ clang++ -fsycl sycl-main.cpp -foffload-static-lib=libsyclapp.a
```

Use of `-foffload-static-lib` is required or the library will be treated as
a normal archive.