[CUDA] Atomic fetch_add call cannot be inlined

**Describe the bug**
DPC++ compiler for NVidia PTX backend cannot inline atomic_ref::fetch_add function that may lead to ~20% performance degradation vs. nvcc on the same device.

**To Reproduce**
[histograms.zip](https://github.com/intel/llvm/files/7970349/histograms.zip)

Steps to reproduce:
```sh
$ unzip histograms.zip
$ cd histograms/cuda
$ export PATH=$PATH:/usr/local/cuda/bin
$ ./build.sh
$ ./hist
Size   23013896 => min: 2.516 ms; avg: 2.637 ms; max: 3.149 ms
$ cd ../dpcpp
$ source <intel/llvm compiler>
$ ./build.sh
$ ./hist
Size   23013896 => min: 2.949 ms; avg: 3.155 ms; max: 3.767 ms
```
_**Performance degradation is about 17-20%.**_

**Environment (please complete the following information):**

- OS: Ubuntu 20.04
- Target device and vendor: NVidia RTX2070
- DPC++ version: clang version 14.0.0 (https://github.com/intel/llvm e15ac50231ddd6fb0a68bae5c9351f5f93114e28)
- Dependencies version: CUDA 11.4

**Additional context**
It was found based on initial investigation, that the major difference between these two binaries are that fact that in case of nvcc atomic operation is fully inlided while for DPC++/clang it is not inlined, call/ret approach is used instead that leads to more instructions executed and higher number of branches.

For DPC++ it was found, that `__spirv_AtomicFAddEXT` function call is not inlined (in spriv.hpp):
```cpp
template <typename T, access::address_space AddressSpace>
__attribute__((always_inline)) typename detail::enable_if_t<std::is_floating_point<T>::value, T>
AtomicFAdd(multi_ptr<T, AddressSpace> MPtr, memory_scope Scope,
           memory_order Order, T Value) {
  auto *Ptr = MPtr.get();
  auto SPIRVOrder = getMemorySemanticsMask(Order);
  auto SPIRVScope = getScope(Scope);
  return __spirv_AtomicFAddEXT(Ptr, SPIRVScope, SPIRVOrder, Value); // <--- HERE
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] Atomic fetch_add call cannot be inlined #5429

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CUDA] Atomic fetch_add call cannot be inlined #5429

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions