Program with device code in multiple translation units fails on CUDA

**Describe the bug**
A simple program with device code in multiple translation units fails in runtime with CUDA_ERROR_INVALID_IMAGE as of https://github.com/intel/llvm/pull/3735

**To Reproduce**
h.hpp:

```
#include <CL/sycl.hpp>

void submit_kernelB();
```
b.cpp
```
#include "h.hpp"

class KernelNameB;

void submit_kernelB() {
  sycl::queue q;
  q.submit([&](sycl::handler &cgh) { cgh.single_task<KernelNameB>([]() {}); });
}
```
main.cpp:
```
#include "h.hpp"
#include <CL/sycl.hpp>

class KernelNameA;
void submit_kernelA() {
  sycl::queue q;
  q.submit([&](sycl::handler &cgh) { cgh.single_task<KernelNameA>([]() {}); });
}

int main() { submit_kernelA(); }
```
```
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice main.cpp b.cpp
./a.out
```
This reproducer fails with CUDA_ERROR_INVALID_IMAGE, note that compiling this results in 2 device images as of https://github.com/intel/llvm/pull/3735, but in only one with it reverted. The error disappears once the number of device images in the application is reduced to 1 (either by moving `submit_kernelB` to the same translation unit as `submit_kernelA`, by using `-fsycl-device-code-split=off` or by reverting https://github.com/intel/llvm/pull/3735).

**Environment:**

- OS: Linux
- Target device and vendor: CUDA, Titan RTX.
- DPC++ version: e9d308e5fc70fd4ca80dfb47a72fbc0b375c11ee
- Dependencies version: CUDA 10.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Program with device code in multiple translation units fails on CUDA #4156

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Program with device code in multiple translation units fails on CUDA #4156

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions