Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][CUDA] __CUDA_ARCH__ defined when compiling for CUDA backend since sycl-nightly/20221129 #7722

Closed
AuroraPerego opened this issue Dec 9, 2022 · 10 comments
Labels
cuda CUDA back-end

Comments

@AuroraPerego
Copy link

While trying to compile my application with different nightlies at some point the compilation started to fail.
I found out that the reason is that __CUDA_ARCH__ is defined, while it wasn't before and I was wondering whether this is wanted or not.

How to reproduce:

clang++ -fsycl -std=c++17 dummy.cpp -dM -E -fsycl-targets=nvptx64-nvidia-cuda -fno-bundle-offload-arch --cuda-path=/usr/local/cuda -Wno-unknown-cuda-version -Wno-linker-warnings | grep CUDA_ARCH

dummy.cpp is just:

#include <CL/sycl.hpp>

int main(){
    return 0;
}

The commit that led to this change is this.

  • OS: Linux
  • CUDA version: 11.7
@hdelan
Copy link
Contributor

hdelan commented Dec 9, 2022

Can you share the error that occurred when compiling this dummy.cpp please?

@AuroraPerego
Copy link
Author

This file compiles, it was just a quick way to see which macros are defined. What is not compiling is the project I'm working on right now that uses Eigen. Since __CUDA_ARCH__ is defined it falls in a #ifdef (here)were it should not go and raises this error:

In file included from /afs/cern.ch/user/a/aperego/public/pixeltrack-standalone/external/eigen/Eigen/Core:170:
/afs/cern.ch/user/a/aperego/public/pixeltrack-standalone/external/eigen/Eigen/src/Core/MathFunctions.h:988:15: fatal error: no member named 'isfinite' in the global namespace
    return (::isfinite)(x);
            ~~^

If you want I can share the steps to reproduce this error, but it's directly related to the fact the __CUDA_ARCH__ is defined and wasn't happening before sycl-nightly/20221129. If this is expected I'll pass to Eigen a flag that ignores __CUDA_ARCH__ to be able to compile anyway.

@AuroraPerego
Copy link
Author

AuroraPerego commented Dec 9, 2022

The definition of __CUDA_ARCH__ has been likely introduced by #6524 via https://github.com/intel/llvm/pull/6524/files#diff-84691cf70860f37e55bcd6c5275edb7d31c60e43d9c9f310a2356be3a78e8418

@AlexeySachkov AlexeySachkov added the cuda CUDA back-end label Dec 12, 2022
@hdelan
Copy link
Contributor

hdelan commented Dec 12, 2022

__CUDA_ARCH__ plays a role in SYCL to allow for compile time branching of device code for CUDA backend. A lot of code relies on this. So it seems that Eigen/src/Core/util/Macros.h needs to be updated to allow for __CUDA_ARCH__ to be defined in SYCL as well. ie it would be best if instead of __CUDA_ARCH__ suggesting that the CUDA API is being used, it might be worthwhile to check that __SYCL_DEVICE_ONLY__ is also not defined.

@fwyzard
Copy link
Contributor

fwyzard commented Dec 13, 2022

In terms of "don't break existing code", wouldn't it be a better approach to not define __CUDA_ARCH__ during SYCL device compilation, and instead use a different preprocessor macro to identify the SYCL target architecture ?

@abagusetty
Copy link
Contributor

abagusetty commented Dec 20, 2022

@fwyzard
Copy link
Contributor

fwyzard commented Dec 21, 2022

@abagusetty thanks for the link.

I do like the idea of having a dedicated macro for the SYCL backends, instead of reusing __CUDA_ARCH__.

However, if the possible values are defined by an enum, I don't see how they could be used in preprocessor conditional compilation ?

Of course they can be used with if constexpr, traits, etc. - but sometimes you really want to #if something in or out.

Maybe the extensions could be made more useful by also defining some corresponding macro names ?
Like SYCL_EXT_ONEAPI_DEVICE_ARCHITECTURE_NVIDIA_GPU_SM_50 with the same value as sycl::ext::oneapi::experimental::architecture::nvidia_gpu_sm_50, etc..

And some aliases to identify the first and last entry of each architecture, so one could do

if constexpr(SYCL_EXT_ONEAPI_DEVICE_ARCHITECTURE >= architecture::nvidia_gpu_first && SYCL_EXT_ONEAPI_DEVICE_ARCHITECTURE <= architecture::nvidia_gpu_last) {
  ...

etc.

WeiqunZhang pushed a commit to AMReX-Codes/amrex that referenced this issue Jan 9, 2023
## Summary

The changes in these two commits, although apparently redundant, guard
SYCL compilations targeting CUDA or HIP devices by preventing the
compiler from seeing the CUDA/HIP code already in AMReX.

## Additional background

The 'vanilla' version of LLVM, which is used by hipSYCL for example,
enables `__CUDA_ARCH__` and `__HIP_DEVICE_COMPILE__` when compiling
device code, for both
[CUDA](https://llvm.org/docs/CompileCudaWithLLVM.html#detecting-clang-vs-nvcc-from-code)
and [HIP](https://reviews.llvm.org/D45441), respectively. Intel's
version of LLVM is also moving in the same direction with
[CUDA](intel/llvm#7722) and
[HIP](intel/llvm#7720), although the latter is
off to a rocky start.
@GeorgeWeb
Copy link
Contributor

Hi @AuroraPerego

We expect this issue to soon be resolved with the changes from PR #8257.

TLDR: We define a unique __SYCL_CUDA_ARCH__ macro and use it instead for SYCL device targets and leave __CUDA_ARCH__ as before for CUDA targets.

steffenlarsen pushed a commit that referenced this issue Feb 21, 2023
…YCL (#8257)

This PR addresses an issue where if we use `__CUDA_ARCH__` causes
intrinsics not to be defined in the CUDA include files.
- Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device
- Update the `sycl-macro.cpp` test to check the appropriate macro.

---

As far as I could find the original issue was introduced from PR
[#6524](7b47ebb)
for enabling the bfloat16 support moving it from the experimental
extension, and it breaks some codebases with CUDA interop calls.
Current reports include github issues
[#7722](#7722),
[#8133](#8133) and
[oneapi-src/oneMKL#257](oneapi-src/oneMKL#257).

For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it
instead for SYCL device targets and leave `__CUDA_ARCH__` as before for
CUDA targets.
zhimingwang36 added a commit to oneapi-src/SYCLomatic that referenced this issue Feb 23, 2023
Co-authored-by: Wenju He <wenju.he@intel.com>
Co-authored-by: Alexey Bader <alexey.bader@intel.com>

* [CI] Use ubuntu-20.04 image

Using ubuntu-latest still causes long delays due to missing runners.

* [SYCL][NFC] Use reducer-access helper function instead of deduction guide (#8411)

The use of deduction guides in the `ReducerAccess` helper class causes
problems when building with a compiler that does not support them. This
commit changes the implementation to use a helper function instead.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

* [SYCL][CUDA] Support host-device memcpy2D  (#8181)

Addresses to support host-device memcpy2D copies

* [SYCL][CUDA] Define __SYCL_CUDA_ARCH__ instead of __CUDA_ARCH__ for SYCL (#8257)

This PR addresses an issue where if we use `__CUDA_ARCH__` causes
intrinsics not to be defined in the CUDA include files.
- Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device
- Update the `sycl-macro.cpp` test to check the appropriate macro.

---

As far as I could find the original issue was introduced from PR
[#6524](intel/llvm@7b47ebb)
for enabling the bfloat16 support moving it from the experimental
extension, and it breaks some codebases with CUDA interop calls.
Current reports include github issues
[#7722](intel/llvm#7722),
[#8133](intel/llvm#8133) and
[oneapi-src/oneMKL#257](oneapi-src/oneMKL#257).

For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it
instead for SYCL device targets and leave `__CUDA_ARCH__` as before for
CUDA targets.

* [NFC][clang][SYCL] Refine the test check by adding `:` (#8416)

The test can fail if wokring directory where the test was launched has a
`error` substring in its path.

* [ESIMD] Reduce number of bit-casts generated for lsc_block_load/store operations (#8385)

* [SYCL] Add sub-group functions emulation for vector of doubles. (#8252)

intel/llvm-test-suite#1603

* [SYCL][PI][CUDA][HIP] Fix bugs that can cause events not to be waited on (#8374)

Fixes two bug in CUDA PI and HIP PI that can cause waiting for events to
do nothing:
- The first one is an off-by-one error when checking if an event needs
to be waited on
- The second one is `last_sync_compute_streams_` /
`last_sync_transfer_streams_` to a new value before checking the streams
which can read these variables, expecting the old values.

Both of these are synchronization related and therefore hard to test
for.

* [SYCL][Fusion] Do not internalize stored argument pointers (#8376)

When a pointer to be promoted is stored, internalization is no longer
safe to perform. In this case, simply bail out and do not promote the
given pointer.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Co-authored-by: Alexey Bader <alexey.bader@intel.com>

* [SYCL] Avoid optimizing out integer conversion (#8409)

This commit fixes and issue where an integer conversion happening inside
an assert would cause the conversion to not happen when assertions were
disabled.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

* [SYCLomatic] Fix sycl to SYCLomatic pull down failed.

Signed-off-by: Tang, Jiajun jiajun.tang@intel.com

---------

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Signed-off-by: Tang, Jiajun jiajun.tang@intel.com
Co-authored-by: Alexey Bader <alexey.bader@intel.com>
Co-authored-by: Chunyang Dai <chunyang.dai@intel.com>
Co-authored-by: Wenju He <wenju.he@intel.com>
Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>
Co-authored-by: Abhishek Bagusetty <59661409+abagusetty@users.noreply.github.com>
Co-authored-by: Georgi Mirazchiyski <georgimweb@gmail.com>
Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
Co-authored-by: fineg74 <61437305+fineg74@users.noreply.github.com>
Co-authored-by: Maksim Sabianin <maksim.sabianin@intel.com>
Co-authored-by: Tadej Ciglarič <tadej.ciglaric@codeplay.com>
Co-authored-by: vic <victor.perez@codeplay.com>
@GeorgeWeb
Copy link
Contributor

Hi @AuroraPerego. Can we close this issue now if there are no more problems to resolve?

@AuroraPerego
Copy link
Author

Hi, yes it's fine for me.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

No branches or pull requests

6 participants