Closed
Description
Describe the bug
A simple program with device code in multiple translation units fails in runtime with CUDA_ERROR_INVALID_IMAGE as of #3735
To Reproduce
h.hpp:
#include <CL/sycl.hpp>
void submit_kernelB();
b.cpp
#include "h.hpp"
class KernelNameB;
void submit_kernelB() {
sycl::queue q;
q.submit([&](sycl::handler &cgh) { cgh.single_task<KernelNameB>([]() {}); });
}
main.cpp:
#include "h.hpp"
#include <CL/sycl.hpp>
class KernelNameA;
void submit_kernelA() {
sycl::queue q;
q.submit([&](sycl::handler &cgh) { cgh.single_task<KernelNameA>([]() {}); });
}
int main() { submit_kernelA(); }
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice main.cpp b.cpp
./a.out
This reproducer fails with CUDA_ERROR_INVALID_IMAGE, note that compiling this results in 2 device images as of #3735, but in only one with it reverted. The error disappears once the number of device images in the application is reduced to 1 (either by moving submit_kernelB
to the same translation unit as submit_kernelA
, by using -fsycl-device-code-split=off
or by reverting #3735).
Environment:
- OS: Linux
- Target device and vendor: CUDA, Titan RTX.
- DPC++ version: e9d308e
- Dependencies version: CUDA 10.1