Skip to content

Commit

Permalink
[SYCL][DOC] Fix warnings after upgrading sphinx
Browse files Browse the repository at this point in the history
New sphinx/myst emits more bad cross-reference targets.

Warnings like:

:'myst' cross-reference target not found: 'prog-scope-var-decl'
[myst.xref_missing]
  • Loading branch information
jsji committed Oct 27, 2023
1 parent e7c0b89 commit edccb9b
Show file tree
Hide file tree
Showing 17 changed files with 47 additions and 37 deletions.
6 changes: 3 additions & 3 deletions sycl/doc/GetStartedGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ and a wide range of compute accelerators such as GPU and FPGA.
* `ninja` -
[Download](https://github.com/ninja-build/ninja/wiki/Pre-built-Ninja-packages)
* C++ compiler
* See LLVM's [host compiler toolchain requirements](../../llvm/docs/GettingStarted.rst#host-c-toolchain-both-compiler-and-standard-library)
* See LLVM's [host compiler toolchain requirements](https://github.com/intel/llvm/blob/sycl/llvm/docs/GettingStarted.rst#host-c-toolchain-both-compiler-and-standard-library)

Alternatively, you can use a Docker image that has everything you need for
building pre-installed:
Expand Down Expand Up @@ -543,7 +543,7 @@ AOT compiler for each device type:
#### CPU
* CPU AOT compiler `opencl-aot` is enabled by default. For more, see
[opencl-aot documentation](../../opencl/opencl-aot/README.md).
[opencl-aot documentation](https://github.com/intel/llvm/blob/sycl/opencl/opencl-aot/README.md).
#### Accelerator
Expand Down Expand Up @@ -709,7 +709,7 @@ ONEAPI_DEVICE_SELECTOR=cuda:* ./simple-sycl-app-cuda.exe
**NOTE**: oneAPI DPC++/SYCL developers can specify SYCL device for execution
using device selectors (e.g. `sycl::cpu_selector_v`, `sycl::gpu_selector_v`,
[Intel FPGA selector(s)](extensions/supported/sycl_ext_intel_fpga_device_selector.md))
[Intel FPGA selector(s)](extensions/supported/sycl_ext_intel_fpga_device_selector.asciidoc))
as explained in following section
[Code the program for a specific GPU](#code-the-program-for-a-specific-gpu).
Expand Down
2 changes: 1 addition & 1 deletion sycl/doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
]

# Implicit targets for cross reference
myst_heading_anchors = 4
myst_heading_anchors = 5

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'friendly'
Expand Down
2 changes: 1 addition & 1 deletion sycl/doc/cuda/opencl-subgroup-vs-cuda-crosslane-op.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# CUDA crosslane vs OpenCL sub-groups

## Sub-group function mapping
This document describes the mapping of the SYCL subgroup operations (based on the proposal [SYCL subgroup proposal](../extensions/sub_group_ndrange/sub_group_ndrange.md)) to CUDA (queries responses and PTX instruction mapping)
This document describes the mapping of the SYCL subgroup operations (based on the proposal SYCL subgroup proposal) to CUDA (queries responses and PTX instruction mapping)

### Sub-group device Queries

Expand Down
5 changes: 3 additions & 2 deletions sycl/doc/design/Assert.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ int main() {
In this use-case every work-item with even index along 0 dimension will trigger
assertion failure. Assertion failure should trigger a call to `std::abort()` at
host as described in
[extension](../extensions/supported/SYCL_EXT_ONEAPI_ASSERT.asciidoc).
[extension](../extensions/supported/sycl_ext_oneapi_assert.asciidoc).
Even though multiple failures of the same or different assertions can happen in
multiple work-items, implementation is required to deliver at least one
assertion. The assertion failure message is printed to `stderr` by DPCPP
Expand Down Expand Up @@ -81,7 +81,7 @@ practical cases.
## How it works?
`assert(expr)` macro ends up in call to `__devicelib_assert_fail`. This function
is part of [Device library extension](DeviceLibExtensions.rst#cl_intel_devicelib_cassert).
is part of [Device library extension](https://github.com/intel/llvm/blob/sycl/doc/design/DeviceLibExtensions.rst#cl_intel_devicelib_cassert).
The format of the assert message is unspecified, but it will always include the
text of the failing expression, the values of the standard macros `__FILE__` and
Expand Down Expand Up @@ -168,6 +168,7 @@ image. All of them should have `extern` declaration of program scope variable
available. Definition of the variable is only available within devicelib in the
same binary image where fallback `__devicelib_assert_fail` resides.
(prog-scope-var-decl)=
<a name="prog-scope-var-decl">The variable has the following structure and
declaration:</a>
Expand Down
6 changes: 3 additions & 3 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Command-Graph Extension

This document describes the implementation design of the
[SYCL Graph Extension](../extensions/proposed/sycl_ext_oneapi_graph.asciidoc).
[SYCL Graph Extension](../extensions/experimental/sycl_ext_oneapi_graph.asciidoc).

A related presentation can be found
[here](https://www.youtube.com/watch?v=aOTAmyr04rM).
Expand Down Expand Up @@ -121,14 +121,14 @@ proposal. Memory operations will be supported subsequently by the current
implementation starting with `memcpy`.

Buffers and accessors are supported in a command-graph. There are
[spec restrictions](../extensions/proposed/sycl_ext_oneapi_graph.asciidoc#storage-lifetimes)
[spec restrictions](../extensions/experimental/sycl_ext_oneapi_graph.asciidoc#storage-lifetimes)
on buffer usage in a graph so that their lifetime semantics are compatible with
a lazy work execution model. However these changes to storage lifetimes have not
yet been implemented.

## Backend Implementation

Implementation of [UR command-buffers](#UR-command-buffer-experimental-feature)
Implementation of UR command-buffers
for each of the supported SYCL 2020 backends.

This is currently only Level Zero but more sub-sections will be added here as
Expand Down
4 changes: 2 additions & 2 deletions sycl/doc/design/CompileTimeProperties.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ One use for compile-time properties is with types that are used exclusively
for declaring global variables. One such example is the
[sycl\_ext\_oneapi\_device\_global][2] extension:

[2]: <../extensions/proposed/sycl_ext_oneapi_device_global.asciidoc>
[2]: <../extensions/experimental/sycl_ext_oneapi_device_global.asciidoc>

```
namespace sycl::ext::oneapi {
Expand Down Expand Up @@ -271,7 +271,7 @@ proposed in the [sycl\_ext\_oneapi\_kernel\_properties][8] extension. There
are two ways the application can specify these properties. The first is by
passing a `properties` parameter to the function that submits the kernel:

[8]: <../extensions/proposed/sycl_ext_oneapi_kernel_properties.asciidoc>
[8]: <../extensions/experimental/sycl_ext_oneapi_kernel_properties.asciidoc>

```
namespace sycl {
Expand Down
6 changes: 3 additions & 3 deletions sycl/doc/design/CompilerAndRuntimeDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -484,7 +484,7 @@ list coming either from `llvm-spirv` or from the AOT backend.
Targeting PTX currently only accepts a single input file for processing, so
`file-table-tform` is used to extract the code file from the file table, which
is then processed by the
["PTX target processing" step](#device-code-post-link-step-for-CUDA).
["PTX target processing" step](#device-code-post-link-step-for-cuda).
The resulting device binary is inserted back into the file table in place of the
extracted code file using `file-table-tform`. If `-fno-sycl-rdc` is specified,
all shown tools are invoked multiple times, once per translation unit rather than
Expand Down Expand Up @@ -556,7 +556,7 @@ TBD

##### Specialization constants lowering

See [corresponding documentation](SpecializationConstants.md)
See corresponding documentation

#### CUDA support

Expand Down Expand Up @@ -1011,4 +1011,4 @@ with any other address space (including default).
## DPC++ Language extensions to SYCL
List of language extensions can be found at [extensions](../extensions)
List of language extensions can be found at [extensions](https://github.com/intel/llvm/blob/sycl/doc/extensions/)
2 changes: 1 addition & 1 deletion sycl/doc/design/DeviceAspectTraitDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,6 @@ This relies on the fact that unspecialized variants of `any_device_has` and

[1]: <https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:device-aspects>
[2]: <../extensions/proposed/sycl_ext_oneapi_device_if.asciidoc>
[3]: <../extensions/proposed/sycl_ext_oneapi_device_architecture.asciidoc>
[3]: <../extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc>
[4]: <DeviceIf.md>
[5]: <OptionalDeviceFeatures.md>
4 changes: 2 additions & 2 deletions sycl/doc/design/DeviceConfigFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ in more detail.

### Changes to Build Infrastructure
We need the information about the targets in multiple tools and compiler
modules listed in [Requirements](#Requirements). Thus, we need to make sure
modules listed in [Requirements](#requirements). Thus, we need to make sure
that the generation of the `.inc` file out of the `.td` file is done in time
for all the consumers. The command we need to run for TableGen is `llvm-tblgen
-gen-dynamic-tables -I /llvm-root/llvm/include/ input.td -o output.inc`.
Expand Down Expand Up @@ -302,7 +302,7 @@ the Device Configuration File (e.g. `sycl-post-link`) so that each of the
tools can modify the map according to the user extensions described in the
`.yaml` file.

As mentioned in [Requirements](#Requirements), there is an auto-detection
As mentioned in [Requirements](#requirements), there is an auto-detection
mechanism for `aot-toolchain` and `aot-toolchain-options` that is able to
infer these from the target name. In the `.yaml` example shown above the target
name is `intel_gpu_skl`. From that name, we can infer that `aot-toolchain` is
Expand Down
2 changes: 1 addition & 1 deletion sycl/doc/design/DeviceGlobal.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This document describes the implementation design for the DPC++ extension
[sycl\_ext\_oneapi\_device\_global][1], which allows applications to declare
global variables in device code.

[1]: <../extensions/proposed/sycl_ext_oneapi_device_global.asciidoc>
[1]: <../extensions/experimental/sycl_ext_oneapi_device_global.asciidoc>


## Requirements
Expand Down
2 changes: 1 addition & 1 deletion sycl/doc/design/DeviceIf.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This document describes the design for the DPC++ implementation of the
[sycl\_ext\_oneapi\_device\_architecture][2] extensions.

[1]: <../extensions/proposed/sycl_ext_oneapi_device_if.asciidoc>
[2]: <../extensions/proposed/sycl_ext_oneapi_device_architecture.asciidoc>
[2]: <../extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc>


## Phased implementation
Expand Down
9 changes: 9 additions & 0 deletions sycl/doc/design/KernelProgramCache.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ predefined HW configuration(s). As a general solution it is reasonable to have
program persistent cache which works between application restarts (e.g. cache
on disk for device code built for specific HW/SW configuration).

(what-is-program)=
<a name="what-is-program">1</a>: Here "program" means an internal SYCL runtime
object corresponding to a device code module or native binary defining a set of
SYCL kernels and/or device functions.
Expand Down Expand Up @@ -112,9 +113,11 @@ The kernels map's key consists of two components:
- the program the kernel belongs to,
- kernel name<sup>[3](#what-is-kname)</sup>.

(what-is-ksid)=
<a name="what-is-ksid">1</a>: Kernel set id is an ordinal number of the device
binary image the kernel is contained in.

(what-is-bopts)=
<a name="what-is-bopts">2</a>: The concatenation of build options (both compile
and link options) set in application or environment variables. There are three
sources of build options that the cache is aware of:
Expand All @@ -131,6 +134,7 @@ values (e.g. IGC has
which affect JIT process). Changing such configuration will invalidate cache and
manual cache cleanup should be done.

(what-is-kname)=
<a name="what-is-kname">3</a>: Kernel name is a kernel ID mangled class' name
which is provided to methods of `sycl::handler` (e.g. `parallel_for` or
`single_task`).
Expand Down Expand Up @@ -162,9 +166,11 @@ stored on disk (in every <n>.src file located in the cache item directory):
containing 2 files: <max_n+1>.src for key values and <max_n+1>.bin for
built image.

(what-is-diid)=
<a name="what-is-diid">1</a>: Hash out of the device code image used as input
for the build.

(what-is-did)=
<a name="what-is-did">2</a>: Hash out of the string which is concatenation of
values for `info::platform::name`, `info::device::name`,
`info::device::version`, `info::device::driver_version` parameters to
Expand Down Expand Up @@ -321,9 +327,11 @@ condition variable. We employ them to signal waiting threads that the build
process for this kernel/program is finished (either successfully or with a
failure).

(remove-pointer)=
<a name="remove-pointer">1</a>: The use of `std::remove_pointer` was omitted for
the sake of simplicity here.

(exception-data)=
<a name="exception-data">2</a>: Actually, we store contents of the exception:
its message and error code.

Expand Down Expand Up @@ -387,6 +395,7 @@ in a directory, the directory should be locked until file creation is done.
Advisory locking <sup>[1](#advisory-lock)</sup> is used to ensure that the
user/OS tools are able to manage files.

(advisory-lock)=
<a name="advisory-lock">1.</a> Advisory locks work only when a process
explicitly acquires and releases locks, and are ignored if a process is not
aware of locks.
Expand Down
2 changes: 1 addition & 1 deletion sycl/doc/design/OffloadDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ the DPC++ Compiler. This leverages the existing community Offloading
design [OffloadingDesign][1] which covers the Clang driver and code generation
steps for creating offloading applications.

[1]: <../../../clang/docs/OffloadingDesign.rst>
[1]: <https://github.com/intel/llvm/blob/clang/docs/OffloadingDesign.rst>

The current offloading model is completely encapsulated within the Clang
Compiler Driver requiring the driver to perform all of the additional steps
Expand Down
6 changes: 3 additions & 3 deletions sycl/doc/design/OptionalDeviceFeatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ non-FPGA users may want to use the `device_global` property
[`device_image_scope`][5], which requires even non-FPGA users to have precise
control over the way kernels are bundled into device images.

[5]: <../extensions/proposed/sycl_ext_oneapi_device_global.asciidoc#properties-for-device-global-variables>
[5]: <../extensions/experimental/sycl_ext_oneapi_device_global.asciidoc#properties-for-device-global-variables>

The new definition of `-fsycl-device-code-split` is as follows:

Expand Down Expand Up @@ -1091,10 +1091,10 @@ The "name" column in this table lists the possible target names. Since not all
targets have a corresponding enumerator in the `architecture` enumeration, the
second column tells when there is such an enumerator. The last row in this
table corresponds to all of the architecture names listed in the
[sycl\_ext\_intel\_device\_architecture][8] extension whose name starts with
[sycl\_ext\_oneapi\_device\_architecture][8] extension whose name starts with
`intel_gpu_`.

[8]: <../extensions/proposed/sycl_ext_intel_device_architecture.asciidoc>
[8]: <../extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc>

TODO: This table needs to be filled out for the CPU variants supported by the
`opencl-aot` tool (avx512, avx2, avx, sse4.2) and for the FPGA targets. We
Expand Down
8 changes: 4 additions & 4 deletions sycl/doc/design/SYCLNativeCPU.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ In order to execute kernels compiled for `native-cpu`, we provide a PI Plugin. T

# Supported features and current limitations

The SYCL Native CPU flow is still WIP, not optimized and several core SYCL features are currently unsupported. Currently `barrier` and several math builtins are not supported, and attempting to use those will most likely fail with an `undefined reference` error at link time. Examples of supported applications can be found in the [runtime tests](sycl/test/native_cpu).
The SYCL Native CPU flow is still WIP, not optimized and several core SYCL features are currently unsupported. Currently `barrier` and several math builtins are not supported, and attempting to use those will most likely fail with an `undefined reference` error at link time. Examples of supported applications can be found in the [runtime tests](https://github.com/intel/llvm/blob/sycl/sycl/test/native_cpu).


To execute the `e2e` tests on the Native CPU, configure the test suite with:
Expand Down Expand Up @@ -93,13 +93,13 @@ entry:
}
```

For the Native CPU target, the device compiler is in charge of materializing the SPIRV builtins (such as `@__spirv_BuiltInGlobalInvocationId`), so that they can be correctly updated by the runtime when executing the kernel. This is performed by the [PrepareSYCLNativeCPU pass](llvm/lib/SYCLLowerIR/PrepareSYCLNativeCPU.cpp).
For the Native CPU target, the device compiler is in charge of materializing the SPIRV builtins (such as `@__spirv_BuiltInGlobalInvocationId`), so that they can be correctly updated by the runtime when executing the kernel. This is performed by the [PrepareSYCLNativeCPU pass](https://github.com/intel/llvm/blob/sycl/llvm/lib/SYCLLowerIR/PrepareSYCLNativeCPU.cpp).
The PrepareSYCLNativeCPUPass also emits a `subhandler` function, which receives the kernel arguments from the SYCL runtime (packed in a vector), unpacks them, and forwards only the used ones to the actual kernel.


## PrepareSYCLNativeCPU Pass

This pass will add a pointer to a `nativecpu_state` struct as kernel argument to all the kernel functions, and it will replace all the uses of SPIRV builtins with the return value of appropriately defined functions, which will read the requested information from the `__nativecpu_state` struct. The `__nativecpu_state` struct and the builtin functions are defined in [native_cpu.hpp](sycl/include/sycl/detail/native_cpu.hpp).
This pass will add a pointer to a `nativecpu_state` struct as kernel argument to all the kernel functions, and it will replace all the uses of SPIRV builtins with the return value of appropriately defined functions, which will read the requested information from the `__nativecpu_state` struct. The `__nativecpu_state` struct and the builtin functions are defined in [native_cpu.hpp](https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/detail/native_cpu.hpp).


The resulting IR is:
Expand Down Expand Up @@ -160,7 +160,7 @@ Each entry in the array contains the kernel name as a string, and a pointer to t

## Kernel lowering and execution

The information produced by the device compiler is then employed to correctly lower the kernel LLVM-IR module to the target ISA (this is performed by the driver when `-fsycl-targets=native_cpu` is set). The object file containing the kernel code is linked with the host object file (and libsycl and any other needed library) and the final executable is ran using the Native CPU PI Plug-in, defined in [pi_native_cpu.cpp](sycl/plugins/native_cpu/pi_native_cpu.cpp).
The information produced by the device compiler is then employed to correctly lower the kernel LLVM-IR module to the target ISA (this is performed by the driver when `-fsycl-targets=native_cpu` is set). The object file containing the kernel code is linked with the host object file (and libsycl and any other needed library) and the final executable is ran using the Native CPU PI Plug-in, defined in [pi_native_cpu.cpp](https://github.com/intel/llvm/blob/sycl/sycl/plugins/native_cpu/pi_native_cpu.cpp).

## Ongoing work

Expand Down
2 changes: 1 addition & 1 deletion sycl/doc/design/SharedLibraries.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@ of defined symbols. If this assumption is not correct, there can be two cases:
device image is taken to use duplicated symbol
- Same symbols have different definitions. In this case ODR violation takes
place, such situation leads to undefined behaviour. For more details refer
to [ODR violations](#ODR-violations) section.
to [ODR violations](#odr-violations) section.
- The situation when two device images of different formats define the same
symbols with two different definitions is not considered as ODR violation.
In this case the suitable device image will be picked.
Expand Down
Loading

0 comments on commit edccb9b

Please sign in to comment.