Skip to content

Commit 25bfb0b

Browse files
authored
[SYCL][Doc] Extension spec for "work_group_memory" (#13725)
Add a proposed extension specification for `work_group_memory`, a lighter weight API to allocate device local memory for an nd-range kernel. Also related, add a list of restrictions that, when followed, provide a guarantee that a kernel written in the free-function kernel syntax can be launched directly via Level Zero or OpenCL.
1 parent 4e36825 commit 25bfb0b

File tree

2 files changed

+609
-0
lines changed

2 files changed

+609
-0
lines changed

sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -773,6 +773,62 @@ int main() {
773773
```
774774

775775

776+
== {dpcpp} guaranteed compatibility with Level Zero and OpenCL backends
777+
778+
The contents of this section are non-normative and apply only to the {dpcpp}
779+
implementation.
780+
Kernels written using the free function kernel syntax can be submitted to a
781+
device by using the Level Zero or OpenCL backends, without going through the
782+
SYCL host runtime APIs.
783+
This works only when the kernel is AOT compiled to native device code using the
784+
`-fsycl-targets` compiler option.
785+
786+
The interface to the kernel in the native device code module is only guaranteed
787+
when the kernel adheres to the following restrictions:
788+
789+
* The kernel is written in the free function kernel syntax;
790+
* The kernel function is declared as `extern "C"`;
791+
* Each formal argument to the kernel is either a {cpp} trivially copyable type
792+
or the `work_group_memory` type (see
793+
link:../proposed/sycl_ext_oneapi_work_group_memory.asciidoc[
794+
sycl_ext_oneapi_work_group_memory]); and
795+
* The translation unit containing the kernel is compiled with the
796+
`-fno-sycl-dead-args-optimization` option.
797+
798+
Both Level Zero and OpenCL identify a kernel via a _name_ string.
799+
(See `zeKernelCreate` and `clCreateKernel` in their respective specifications.)
800+
When a kernel is defined according to the restrictions above, the _name_ is
801+
guaranteed to be the same as the name of the kernel's function in the {cpp}
802+
source code but with "++__sycl_kernel_++" prefixed.
803+
For example, if the function name is "foo", the kernel's name in the native
804+
device code module is "++__sycl_kernel_foo++".
805+
806+
Both Level Zero and OpenCL set kernel argument values using three pieces of
807+
information:
808+
809+
* The index of the argument;
810+
* The size (in bytes) of the value; and
811+
* A pointer to the start of the value.
812+
813+
(See `zeKernelSetArgumentValue` and `clSetKernelArg` in their respective
814+
specifications.)
815+
816+
When a kernel is defined according to the restrictions above, the argument
817+
indices are the same as the positions of the formal kernel arguments in the
818+
{cpp} source code.
819+
The first argument has index 0, the next has index 1, etc.
820+
821+
If an argument has a trivially copyable type, the size must be the size of that
822+
type, and the pointer must point to a memory region that has the same size and
823+
representation as that trivially copyable type.
824+
825+
If an argument has the type `work_group_memory`, the size must be the size (in
826+
bytes) of the device local memory that is represented by the
827+
`work_group_memory` argument.
828+
The pointer passed to `zeKernelSetArgumentValue` or `clSetKernelArg` must be
829+
NULL in this case.
830+
831+
776832
== Implementation notes
777833

778834
=== Compiler diagnostics

0 commit comments

Comments
 (0)