Skip to content

Commit b9ee235

Browse files
committed
[SYCL] Apply more suggestions from code review
Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
1 parent 49d19d3 commit b9ee235

File tree

1 file changed

+51
-36
lines changed

1 file changed

+51
-36
lines changed

sycl/doc/SYCL_compiler_and_runtime_design.md

Lines changed: 51 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -42,20 +42,21 @@ back-end. Today middle-end transformations include just a couple of passes:
4242
- **Back-end** - produces native "device" code in ahead-of-time compilation
4343
mode.
4444

45-
### SYCL support in clang frontend
45+
### SYCL support in Clang front-end
4646

47-
SYCL support in clang frontend can be split into the following components:
47+
SYCL support in Clang front-end can be split into the following components:
4848

49-
- Device code outlining. Since SYCL is a single source programming model
50-
compiler should be able to separate device code from host code.
49+
- Device code outlining. This component is responsible for identifying and
50+
outlining "device code" in the single source.
5151

52-
- Lowering of lambda function objects and named function objects ("SYCL kernel
53-
functions"). To execute "SYCL kernel functions" on OpenCL devices some
54-
transformations are required.
52+
- SYCL kernel function object (functor or lambda) lowering. This component
53+
creates an OpenCL kernel function interface for SYCL kernels.
5554

56-
- Device code diagnostics.
55+
- Device code diagnostics. This component enforces language restrictions on
56+
device code.
5757

58-
- Integration header generation.
58+
- Integration header generation. This component emits information required for
59+
binding host and device parts of the SYCL code via OpenCL API.
5960

6061
#### Device code outlining
6162

@@ -64,11 +65,11 @@ work:
6465

6566
```C++
6667
int foo(int x) { return ++x; }
67-
int bar(int x) { throw std::exception("CPU code only!"); }
68+
int bar(int x) { throw std::exception{"CPU code only!"}; }
6869
...
6970
using namespace cl::sycl;
7071
queue Q;
71-
buffer<int, 1> a(range<1>{1024});
72+
buffer<int, 1> a{range<1>{1024}};
7273
Q.submit([&](handler& cgh) {
7374
auto A = a.get_access<access::mode::write>(cgh);
7475
cgh.parallel_for<init_a>(range<1>{1024}, [=](id<1> index) {
@@ -78,51 +79,65 @@ Q.submit([&](handler& cgh) {
7879
...
7980
```
8081
81-
SYCL compiler needs to compile lambda exression passed to
82-
`cl::sycl::handler::parallel_for` method and function `foo` called from this
83-
lambda function. Compiler also must ignore bar function when we compile the "device" part
84-
of the single source code.
85-
86-
Current approach is to use the SYCL kernel atttribute in SYCL runtime to mark code
87-
passed to `cl::sycl::handler::parallel_for` as "kernel functions".
88-
Obviously runtime library can't mark foo as "device" code - this is a compiler
82+
In this example, the SYCL compiler needs to compile the lambda expression passed
83+
to the `cl::sycl::handler::parallel_for` method, as well as the function `foo`
84+
called from the lambda expression for the device.
85+
The compiler must also ignore the `bar` function when we compile the
86+
"device" part of the single source code, as it's unused inside the device
87+
portion of the source code (the contents of the lambda expression passed to the
88+
`cl::sycl::handler::parallel_for` and any function called from this lambda
89+
expression).
90+
91+
The current approach is to use the SYCL kernel attribute in the SYCL runtime to
92+
mark code passed to `cl::sycl::handler::parallel_for` as "kernel functions".
93+
The SYCL runtime library can't mark foo as "device" code - this is a compiler
8994
job: to traverse all symbols accessible from kernel functions and add them to
9095
the "device part" of the code marking them with the new SYCL device attribute.
9196
9297
#### Lowering of lambda function objects and named function objects
9398
9499
All SYCL memory objects shared between host and device (buffers/images,
95100
these objects map to OpenCL buffers and images) must be accessed through special
96-
`accessor` classes. The "device" side implementation of these classes contain pointers to the device memory. There is no
97-
way in OpenCL to pass structures with pointers inside as kernel arguments.
98-
SYCL also has special mechanism for passing kernel arguments from host to
99-
device, if in OpenCL you need to call `clSetKernelArg`, in SYCL all
100-
kernel arguments are captures/fields of lambda/functor which is passed to
101-
`parallel_for` (in code snippet above one kernel argument - `accessor A`).
102-
To map to OpenCL kernel arguments setting mechanism we added generation of
103-
"kernel wrapper" function inside the compiler. "Kernel wrapper" function
104-
contains body of SYCL kernel function, receives OpenCL like parameters and
105-
additionally does some manipulation to initialize captured lambda/functor
106-
fields with these parameters. In some pseudo code "kernel wrapper" looks like
107-
this:
101+
`accessor` classes. The "device" side implementation of these classes contain
102+
pointers to the device memory. As there is no way in OpenCL to pass structures
103+
with pointers inside as kernel arguments all memory objects shared between host
104+
and device must be passed to the kernel as raw pointers.
105+
SYCL also has a special mechanism for passing kernel arguments from host to
106+
the device. In OpenCL you need to call `clSetKernelArg`, in SYCL all the
107+
kernel arguments are captures/fields of lambda/functor SYCL functions for
108+
invoking kernels (such as `parallel_for`). For example, in the previous code
109+
snippet above `accessor` `A` is one such captured kernel argument.
110+
111+
To facilitate the mapping of the captures/fields of lambdas/functors to OpenCL
112+
kernel and overcome OpenCL limitations we added the generation of a "kernel
113+
wrapper" function inside the compiler. A "kernel wrapper" function contains the
114+
body of the SYCL kernel function, receives OpenCL like parameters and
115+
additionally does some manipulation to initialize captured lambda/functor fields
116+
with these parameters. In some pseudo code the "kernel wrapper" for the previous
117+
code snippet above looks like this:
108118
109119
```C++
110-
// SYCL kernel is defined in SYCL headers
111-
__attribute__((sycl_kernel)) someSYCLKernel(lambda) {
120+
121+
// Let the lambda expression passed to the parallel_for declare unnamed
122+
// function object with "Lambda" type.
123+
124+
// SYCL kernel is defined in SYCL headers:
125+
__attribute__((sycl_kernel)) someSYCLKernel(Lambda lambda) {
112126
lambda();
113127
}
114128
115129
// Kernel wrapper
116130
__kernel wrapper(global int* a) {
117-
lambda; // Actually lambda declaration doesn't have a name in AST
118-
// Let the lambda have one captured field - accessor A. We need to init it with
119-
// global pointer from arguments:
131+
Lambda lambda; // Actually lambda declaration doesn't have a name in AST
132+
// Let the lambda have one captured field - accessor A. We need to init it
133+
// with global pointer from arguments:
120134
lambda.A.__init(a);
121135
// Body of SYCL kernel from SYCL headers:
122136
{
123137
lambda();
124138
}
125139
}
140+
126141
```
127142

128143
"Kernel wrapper" is generated by the compiler inside the Sema using AST nodes.

0 commit comments

Comments
 (0)