
Description
Describe the bug
After merging #4820 compilation fails with error "Cannot select: intrinsic %llvm.nvvm.atomic.add.shared.i.cta".
fatal error: error in backend: Cannot select: intrinsic %llvm.nvvm.atomic.add.shared.i.cta
llvm-foreach:
clang++: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 14.0.0 (https://github.com/intel/llvm.git c855fd177a88d9fc6db87bc2a6c8001f0398ec9a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: .../build/bin
clang++: note: diagnostic msg: Error generating preprocessed source(s).
To Reproduce
Let's compile the following example for the CUDA backend
#include <sycl/sycl.hpp>
int main() {
constexpr static size_t buffer_size = 4;
sycl::queue stream;
sycl::buffer<int, 1> buffer(buffer_size);
stream.submit(
[&](sycl::handler &cgh) {
sycl::accessor<int, 1, sycl::access_mode::read_write, sycl::access::target::local> acc(sycl::range<1>(buffer_size), cgh);
auto acc_write = buffer.get_access<sycl::access::mode::write>(cgh);
cgh.parallel_for<class demo_kernel>(buffer_size, [=](sycl::id<1> item) {
sycl::atomic<int, sycl::access::address_space::local_space>(sycl::local_ptr<int>(acc.get_pointer())).fetch_add(1);
});
});
return 0;
}
using the following command:
clang++ -fsycl -fsycl-unnamed-lambda -fsycl-targets=nvptx64-nvidia-cuda -O2 example.cpp -o example -lOpenCL
or with -nocudalib
parameter, it's enough to see the error:
clang++ -fsycl -fsycl-unnamed-lambda -fsycl-targets=nvptx64-nvidia-cuda -O2 example.cpp -o example -lOpenCL -nocudalib
The described above error (fatal error: error in backend: Cannot select: intrinsic %llvm.nvvm.atomic.add.shared.i.cta
) occurs while the successful compilation is expected.
Environment (please complete the following information):
- OS: Ubuntu 20.04
- Target device and vendor: GPU: TITAN RTX
- DPC++ version:
c855fd1 - Dependencies version:
NVIDIA-SMI 470.57.02
CUDA 11.4
Additional context
Potential guilty commit
commit 2ebde5f
Author: Tadej Ciglarič tadej.ciglaric@codeplay.com
Date: Thu Nov 18 05:45:46 2021 +0100
[SYCL][CUDA][libclc] Added atomics with scopes and memory orders (#4820)
Added libclc implementations for CUDA atomics, including for various scopes and memory orders. They are implemented using LLVM intrinsics and exposed as clang builtins, which are than used to implement functions in libclc.