@@ -42,20 +42,21 @@ back-end. Today middle-end transformations include just a couple of passes:
42
42
- ** Back-end** - produces native "device" code in ahead-of-time compilation
43
43
mode.
44
44
45
- ### SYCL support in clang frontend
45
+ ### SYCL support in Clang front-end
46
46
47
- SYCL support in clang frontend can be split into the following components:
47
+ SYCL support in Clang front-end can be split into the following components:
48
48
49
- - Device code outlining. Since SYCL is a single source programming model
50
- compiler should be able to separate device code from host code .
49
+ - Device code outlining. This component is responsible for identifying and
50
+ outlining " device code" in the single source .
51
51
52
- - Lowering of lambda function objects and named function objects ("SYCL kernel
53
- functions"). To execute "SYCL kernel functions" on OpenCL devices some
54
- transformations are required.
52
+ - SYCL kernel function object (functor or lambda) lowering. This component
53
+ creates an OpenCL kernel function interface for SYCL kernels.
55
54
56
- - Device code diagnostics.
55
+ - Device code diagnostics. This component enforces language restrictions on
56
+ device code.
57
57
58
- - Integration header generation.
58
+ - Integration header generation. This component emits information required for
59
+ binding host and device parts of the SYCL code via OpenCL API.
59
60
60
61
#### Device code outlining
61
62
@@ -64,11 +65,11 @@ work:
64
65
65
66
``` C++
66
67
int foo (int x) { return ++x; }
67
- int bar(int x) { throw std::exception( "CPU code only!") ; }
68
+ int bar(int x) { throw std::exception{ "CPU code only!"} ; }
68
69
...
69
70
using namespace cl::sycl;
70
71
queue Q;
71
- buffer<int, 1> a( range<1>{1024}) ;
72
+ buffer<int, 1> a{ range<1>{1024}} ;
72
73
Q.submit([ &] (handler& cgh) {
73
74
auto A = a.get_access< access::mode::write > (cgh);
74
75
cgh.parallel_for<init_a>(range<1>{1024}, [ =] (id<1> index) {
@@ -78,51 +79,65 @@ Q.submit([&](handler& cgh) {
78
79
...
79
80
```
80
81
81
- SYCL compiler needs to compile lambda exression passed to
82
- `cl::sycl::handler::parallel_for` method and function `foo` called from this
83
- lambda function. Compiler also must ignore bar function when we compile the "device" part
84
- of the single source code.
85
-
86
- Current approach is to use the SYCL kernel atttribute in SYCL runtime to mark code
87
- passed to `cl::sycl::handler::parallel_for` as "kernel functions".
88
- Obviously runtime library can't mark foo as "device" code - this is a compiler
82
+ In this example, the SYCL compiler needs to compile the lambda expression passed
83
+ to the `cl::sycl::handler::parallel_for` method, as well as the function `foo`
84
+ called from the lambda expression for the device.
85
+ The compiler must also ignore the `bar` function when we compile the
86
+ "device" part of the single source code, as it's unused inside the device
87
+ portion of the source code (the contents of the lambda expression passed to the
88
+ `cl::sycl::handler::parallel_for` and any function called from this lambda
89
+ expression).
90
+
91
+ The current approach is to use the SYCL kernel attribute in the SYCL runtime to
92
+ mark code passed to `cl::sycl::handler::parallel_for` as "kernel functions".
93
+ The SYCL runtime library can't mark foo as "device" code - this is a compiler
89
94
job: to traverse all symbols accessible from kernel functions and add them to
90
95
the "device part" of the code marking them with the new SYCL device attribute.
91
96
92
97
#### Lowering of lambda function objects and named function objects
93
98
94
99
All SYCL memory objects shared between host and device (buffers/images,
95
100
these objects map to OpenCL buffers and images) must be accessed through special
96
- `accessor` classes. The "device" side implementation of these classes contain pointers to the device memory. There is no
97
- way in OpenCL to pass structures with pointers inside as kernel arguments.
98
- SYCL also has special mechanism for passing kernel arguments from host to
99
- device, if in OpenCL you need to call `clSetKernelArg`, in SYCL all
100
- kernel arguments are captures/fields of lambda/functor which is passed to
101
- `parallel_for` (in code snippet above one kernel argument - `accessor A`).
102
- To map to OpenCL kernel arguments setting mechanism we added generation of
103
- "kernel wrapper" function inside the compiler. "Kernel wrapper" function
104
- contains body of SYCL kernel function, receives OpenCL like parameters and
105
- additionally does some manipulation to initialize captured lambda/functor
106
- fields with these parameters. In some pseudo code "kernel wrapper" looks like
107
- this:
101
+ `accessor` classes. The "device" side implementation of these classes contain
102
+ pointers to the device memory. As there is no way in OpenCL to pass structures
103
+ with pointers inside as kernel arguments all memory objects shared between host
104
+ and device must be passed to the kernel as raw pointers.
105
+ SYCL also has a special mechanism for passing kernel arguments from host to
106
+ the device. In OpenCL you need to call `clSetKernelArg`, in SYCL all the
107
+ kernel arguments are captures/fields of lambda/functor SYCL functions for
108
+ invoking kernels (such as `parallel_for`). For example, in the previous code
109
+ snippet above `accessor` `A` is one such captured kernel argument.
110
+
111
+ To facilitate the mapping of the captures/fields of lambdas/functors to OpenCL
112
+ kernel and overcome OpenCL limitations we added the generation of a "kernel
113
+ wrapper" function inside the compiler. A "kernel wrapper" function contains the
114
+ body of the SYCL kernel function, receives OpenCL like parameters and
115
+ additionally does some manipulation to initialize captured lambda/functor fields
116
+ with these parameters. In some pseudo code the "kernel wrapper" for the previous
117
+ code snippet above looks like this:
108
118
109
119
```C++
110
- // SYCL kernel is defined in SYCL headers
111
- __attribute__((sycl_kernel)) someSYCLKernel(lambda) {
120
+
121
+ // Let the lambda expression passed to the parallel_for declare unnamed
122
+ // function object with "Lambda" type.
123
+
124
+ // SYCL kernel is defined in SYCL headers:
125
+ __attribute__((sycl_kernel)) someSYCLKernel(Lambda lambda) {
112
126
lambda();
113
127
}
114
128
115
129
// Kernel wrapper
116
130
__kernel wrapper(global int* a) {
117
- lambda; // Actually lambda declaration doesn't have a name in AST
118
- // Let the lambda have one captured field - accessor A. We need to init it with
119
- // global pointer from arguments:
131
+ Lambda lambda; // Actually lambda declaration doesn't have a name in AST
132
+ // Let the lambda have one captured field - accessor A. We need to init it
133
+ // with global pointer from arguments:
120
134
lambda.A.__init(a);
121
135
// Body of SYCL kernel from SYCL headers:
122
136
{
123
137
lambda();
124
138
}
125
139
}
140
+
126
141
```
127
142
128
143
"Kernel wrapper" is generated by the compiler inside the Sema using AST nodes.
0 commit comments