Description
I am trying the cuda
branch (currently at 38ec8bf8f2c) and I can compile and run the simple application from the "getting started" guide.
However, the behaviour of the plugin system does not seem to match what is documented.
According to the guide,
the CUDA backend must be selected at runtime using the SYCL_BE environment variable.
SYCL_BE=PI_CUDA ./simple-sycl-app-cuda.exe
However, setting the SYCL_BE
variable does not seem to make any difference.
Let's build the simple application (I've added a print out of the device being used):
$ build/bin/clang++ -fsycl -fsycl-targets=spir64-unknown-linux-sycldevice -Wno-unknown-cuda-version simple-sycl-app.cpp -o simple-sycl-app-opencl
Check the available devices:
$ clinfo -l
Platform #0: Intel(R) OpenCL HD Graphics
`-- Device #0: Intel(R) Gen9 HD Graphics NEO
Platform #1: Intel(R) OpenCL
`-- Device #0: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
(there's a Tesla K40 as well, but the NVIDIA ICD has been removed to avoid even more confusion)
And run with no SYCL_BE
variable:
$ ./simple-sycl-app-opencl
Running on SYCL device Intel(R) Gen9 HD Graphics NEO, driver version 20.06.15619
The results are correct!
Set the OpenCL backend:
$ SYCL_BE=PI_OPENCL ./simple-sycl-app-opencl
Running on SYCL device Intel(R) Gen9 HD Graphics NEO, driver version 20.06.15619
The results are correct!
OK, so far so good. Now with the CUDA backend:
$ SYCL_BE=PI_CUDA ./simple-sycl-app-opencl
Running on SYCL device Intel(R) Gen9 HD Graphics NEO, driver version 20.06.15619
The results are correct!
Ehm, what ?
Something similar happens with the CUDA backend.
Let's rebuild the application with CUDA support:
$ build/bin/clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice -Wno-unknown-cuda-version simple-sycl-app.cpp -o simple-sycl-app-cuda
Try with the default selector:
$ ./simple-sycl-app-cuda
Running on SYCL device Intel(R) Gen9 HD Graphics NEO, driver version 20.06.15619
terminate called after throwing an instance of 'cl::sycl::runtime_error'
what(): OpenCL API failed. OpenCL API returns: -42 (CL_INVALID_BINARY) -42 (CL_INVALID_BINARY)
Aborted (core dumped)
That didn't work... maybe setting the plugin explicitly ?
$ SYCL_BE=PI_CUDA ./simple-sycl-app-cuda
Running on SYCL device Intel(R) Gen9 HD Graphics NEO, driver version 20.06.15619
terminate called after throwing an instance of 'cl::sycl::runtime_error'
what(): OpenCL API failed. OpenCL API returns: -42 (CL_INVALID_BINARY) -42 (CL_INVALID_BINARY)
Aborted (core dumped)
Nope.
OK, remove all ICDs and try again:
$ OCL_ICD_VENDORS= SYCL_BE=PI_CUDA ./simple-sycl-app-cuda
Running on SYCL device Tesla K40c, driver version CUDA 10.20
The results are correct!
So, it does work.
And without the SYCL_BE
variable ?
$ OCL_ICD_VENDORS= ./simple-sycl-app-cuda
Running on SYCL device Tesla K40c, driver version CUDA 10.20
The results are correct!
Still works.
What if I force the OpenCL plugin ?
$ OCL_ICD_VENDORS= SYCL_BE=PI_OPENCL ./simple-sycl-app-cuda
Running on SYCL device Tesla K40c, driver version CUDA 10.20
The results are correct!
It still works !
So, it seems that the SYCL_BE
variable is not needed any more, and in fact it is mostly ignored:
$ OCL_ICD_VENDORS= SYCL_BE=PI_IS_314 ./simple-sycl-app-cuda
Running on SYCL device Tesla K40c, driver version CUDA 10.20
The results are correct!
The only value with any effect seems to be PI_OTHER
:
$ OCL_ICD_VENDORS= SYCL_BE=PI_OTHER ./simple-sycl-app-cuda
pi_die: Unknown SYCL_BE
terminate called without an active exception
Aborted (core dumped)
An other interesting and possibly undocumented result: it looks like it is possible to build a binary that supports both OpenCL and CUDA backends:
$ build/bin/clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice,spir64-unknown-linux-sycldevice -Wno-unknown-cuda-version simple-sycl-app.cpp -o simple-sycl-app
$ ./simple-sycl-app
Running on SYCL device Intel(R) Gen9 HD Graphics NEO, driver version 20.06.15619
The results are correct!
$ OCL_ICD_VENDORS= ./simple-sycl-app
Running on SYCL device Tesla K40c, driver version CUDA 10.20
The results are correct!
Magic !