Skip to content

Program with device code in multiple translation units fails on CUDA #4156

Closed
@sergey-semenov

Description

@sergey-semenov

Describe the bug
A simple program with device code in multiple translation units fails in runtime with CUDA_ERROR_INVALID_IMAGE as of #3735

To Reproduce
h.hpp:

#include <CL/sycl.hpp>

void submit_kernelB();

b.cpp

#include "h.hpp"

class KernelNameB;

void submit_kernelB() {
  sycl::queue q;
  q.submit([&](sycl::handler &cgh) { cgh.single_task<KernelNameB>([]() {}); });
}

main.cpp:

#include "h.hpp"
#include <CL/sycl.hpp>

class KernelNameA;
void submit_kernelA() {
  sycl::queue q;
  q.submit([&](sycl::handler &cgh) { cgh.single_task<KernelNameA>([]() {}); });
}

int main() { submit_kernelA(); }
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice main.cpp b.cpp
./a.out

This reproducer fails with CUDA_ERROR_INVALID_IMAGE, note that compiling this results in 2 device images as of #3735, but in only one with it reverted. The error disappears once the number of device images in the application is reduced to 1 (either by moving submit_kernelB to the same translation unit as submit_kernelA, by using -fsycl-device-code-split=off or by reverting #3735).

Environment:

  • OS: Linux
  • Target device and vendor: CUDA, Titan RTX.
  • DPC++ version: e9d308e
  • Dependencies version: CUDA 10.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudaCUDA back-end

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions