Skip to content

[SYCL][CUDA] Fatal error: error in backend: Cannot select: intrinsic %llvm.nvvm.atomic.add.shared.i.cta  #5008

Closed
@ghost

Description

Describe the bug
After merging #4820 compilation fails with error "Cannot select: intrinsic %llvm.nvvm.atomic.add.shared.i.cta".

fatal error: error in backend: Cannot select: intrinsic %llvm.nvvm.atomic.add.shared.i.cta
llvm-foreach:
clang++: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 14.0.0 (https://github.com/intel/llvm.git c855fd177a88d9fc6db87bc2a6c8001f0398ec9a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: .../build/bin
clang++: note: diagnostic msg: Error generating preprocessed source(s).

To Reproduce

Let's compile the following example for the CUDA backend

#include <sycl/sycl.hpp>

int main() {
  constexpr static size_t buffer_size = 4;
  sycl::queue stream;
  sycl::buffer<int, 1> buffer(buffer_size);
  stream.submit(
    [&](sycl::handler &cgh) {
        sycl::accessor<int, 1, sycl::access_mode::read_write, sycl::access::target::local> acc(sycl::range<1>(buffer_size), cgh);
        auto acc_write = buffer.get_access<sycl::access::mode::write>(cgh);

        cgh.parallel_for<class demo_kernel>(buffer_size, [=](sycl::id<1> item) {
          sycl::atomic<int, sycl::access::address_space::local_space>(sycl::local_ptr<int>(acc.get_pointer())).fetch_add(1);         
        });
    });

  return 0;
}

using the following command:

clang++ -fsycl -fsycl-unnamed-lambda -fsycl-targets=nvptx64-nvidia-cuda -O2 example.cpp -o example  -lOpenCL

or with -nocudalib parameter, it's enough to see the error:

clang++ -fsycl -fsycl-unnamed-lambda -fsycl-targets=nvptx64-nvidia-cuda -O2 example.cpp -o example  -lOpenCL -nocudalib

The described above error (fatal error: error in backend: Cannot select: intrinsic %llvm.nvvm.atomic.add.shared.i.cta) occurs while the successful compilation is expected.

Environment (please complete the following information):

  • OS: Ubuntu 20.04
  • Target device and vendor: GPU: TITAN RTX
  • DPC++ version:
    c855fd1
  • Dependencies version:
    NVIDIA-SMI 470.57.02
    CUDA 11.4

Additional context

Potential guilty commit
commit 2ebde5f
Author: Tadej Ciglarič tadej.ciglaric@codeplay.com
Date: Thu Nov 18 05:45:46 2021 +0100

[SYCL][CUDA][libclc] Added atomics with scopes and memory orders (#4820)

Added libclc implementations for CUDA atomics, including for various scopes and memory orders. They are implemented using LLVM intrinsics and exposed as clang builtins, which are than used to implement functions in libclc.

Metadata

Metadata

Assignees

Labels

blockerBlocks important functionalitybugSomething isn't workingcompilerCompiler related issuecudaCUDA back-end

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions