Description
When trying to build any SYCL-enabled project with CMake and the compiler built from the sycl
branch of intel/llvm (see below), the successfully compiled binary kernel throws an error along the lines of "No kernel named (mangled name) was found -46 (PI_ERROR_INVALID_KERNEL_NAME)" while trying to execute any SYCL kernel.
I can get the error to arise even with the vector-add-buffers
example in the oneAPI-samples repository.
However, if I use an installed copy of oneAPI Base toolkit (version 2023.2.1), then the example (and indeed other SYCL-enabled projects) work fine!
After digging around, the consensus is that I should be using the compiler (clang-cl.exe
) to link, while the default build process uses lld-link.exe
. Okay, so I did that and used clang-cl.exe
to link with the appropriate options instead, but it does not fix the issue.
After carefully comparing the build process for the oneAPI toolkit versus the self-built clang/llvm from this repo, it would seem that the linking step of the oneAPI toolkit build (which is using icx.exe
) is doing many more things than when I link using the self-built clang-cl.exe or lld-link.exe (e.g. icx.exe
makes multiple calls to clang-offload-builder
).
Another big indicator that things are different is that the oneAPI toolkit built executable is almost 4 times larger than the executable from the self-built clang compiler.
Important note: In the case of the vector-add-buffers
oneAPI sample, if I forego CMake and compile just the single source code file in a one-liner (e.g. clang-cl -fsycl /EHsc vector-add-buffers.cpp -o vector-add-buffers.exe
), then it does work as expected.
Can anyone tell me what are the additional steps I need to take to get the linking step of the self-built clang to behave like the oneAPI toolkit?
An alternative way to phrase this might be, how do I deploy the self-built clang/DPC++ toolchain on my local system?
Environment:
- OS: Windows 10 Pro
- Target device: Any device I have it seems, but let's stick with an Intel Core i7-10875H for simplicity.
- DPC++ version: clang version 18.0.0, commit 47083f8 (tag: nightly-2023-09-28)
- clang was built locally with no additional options to configure.py or compile.py following the Getting Started instructions.
- Using
set ONEAPI_DEVICE_SELECTOR=opencl:cpu
to force calculation on the CPU.
CMakeLists.txt
P.S. If you're wondering why I'd want to use SYCL to target CPUs, it is only a stepping stone to NVIDIA GPU offloading...once I can get things working...
P.P.S. I have tried other commits, including a nightly from late last week, and one from the end of June matching the last release date of the oneAPI toolkit, but the problem persists.
Thank you in advance for any help or suggestions you can give!
My slightly modified vector-add-buffers.cpp
and corresponding CMakelists.txt
are attached.
vector-add-buffers.txt
CMakeLists.txt