-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][CUDA] __CUDA_ARCH__
defined when compiling for CUDA backend since sycl-nightly/20221129
#7722
Comments
Can you share the error that occurred when compiling this |
This file compiles, it was just a quick way to see which macros are defined. What is not compiling is the project I'm working on right now that uses Eigen. Since In file included from /afs/cern.ch/user/a/aperego/public/pixeltrack-standalone/external/eigen/Eigen/Core:170:
/afs/cern.ch/user/a/aperego/public/pixeltrack-standalone/external/eigen/Eigen/src/Core/MathFunctions.h:988:15: fatal error: no member named 'isfinite' in the global namespace
return (::isfinite)(x);
~~^ If you want I can share the steps to reproduce this error, but it's directly related to the fact the |
The definition of |
|
In terms of "don't break existing code", wouldn't it be a better approach to not define |
@abagusetty thanks for the link. I do like the idea of having a dedicated macro for the SYCL backends, instead of reusing However, if the possible values are defined by an Of course they can be used with Maybe the extensions could be made more useful by also defining some corresponding macro names ? And some aliases to identify the first and last entry of each architecture, so one could do if constexpr(SYCL_EXT_ONEAPI_DEVICE_ARCHITECTURE >= architecture::nvidia_gpu_first && SYCL_EXT_ONEAPI_DEVICE_ARCHITECTURE <= architecture::nvidia_gpu_last) {
... etc. |
## Summary The changes in these two commits, although apparently redundant, guard SYCL compilations targeting CUDA or HIP devices by preventing the compiler from seeing the CUDA/HIP code already in AMReX. ## Additional background The 'vanilla' version of LLVM, which is used by hipSYCL for example, enables `__CUDA_ARCH__` and `__HIP_DEVICE_COMPILE__` when compiling device code, for both [CUDA](https://llvm.org/docs/CompileCudaWithLLVM.html#detecting-clang-vs-nvcc-from-code) and [HIP](https://reviews.llvm.org/D45441), respectively. Intel's version of LLVM is also moving in the same direction with [CUDA](intel/llvm#7722) and [HIP](intel/llvm#7720), although the latter is off to a rocky start.
We expect this issue to soon be resolved with the changes from PR #8257. TLDR: We define a unique |
…YCL (#8257) This PR addresses an issue where if we use `__CUDA_ARCH__` causes intrinsics not to be defined in the CUDA include files. - Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device - Update the `sycl-macro.cpp` test to check the appropriate macro. --- As far as I could find the original issue was introduced from PR [#6524](7b47ebb) for enabling the bfloat16 support moving it from the experimental extension, and it breaks some codebases with CUDA interop calls. Current reports include github issues [#7722](#7722), [#8133](#8133) and [oneapi-src/oneMKL#257](oneapi-src/oneMKL#257). For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it instead for SYCL device targets and leave `__CUDA_ARCH__` as before for CUDA targets.
Co-authored-by: Wenju He <wenju.he@intel.com> Co-authored-by: Alexey Bader <alexey.bader@intel.com> * [CI] Use ubuntu-20.04 image Using ubuntu-latest still causes long delays due to missing runners. * [SYCL][NFC] Use reducer-access helper function instead of deduction guide (#8411) The use of deduction guides in the `ReducerAccess` helper class causes problems when building with a compiler that does not support them. This commit changes the implementation to use a helper function instead. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com> * [SYCL][CUDA] Support host-device memcpy2D (#8181) Addresses to support host-device memcpy2D copies * [SYCL][CUDA] Define __SYCL_CUDA_ARCH__ instead of __CUDA_ARCH__ for SYCL (#8257) This PR addresses an issue where if we use `__CUDA_ARCH__` causes intrinsics not to be defined in the CUDA include files. - Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device - Update the `sycl-macro.cpp` test to check the appropriate macro. --- As far as I could find the original issue was introduced from PR [#6524](intel/llvm@7b47ebb) for enabling the bfloat16 support moving it from the experimental extension, and it breaks some codebases with CUDA interop calls. Current reports include github issues [#7722](intel/llvm#7722), [#8133](intel/llvm#8133) and [oneapi-src/oneMKL#257](oneapi-src/oneMKL#257). For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it instead for SYCL device targets and leave `__CUDA_ARCH__` as before for CUDA targets. * [NFC][clang][SYCL] Refine the test check by adding `:` (#8416) The test can fail if wokring directory where the test was launched has a `error` substring in its path. * [ESIMD] Reduce number of bit-casts generated for lsc_block_load/store operations (#8385) * [SYCL] Add sub-group functions emulation for vector of doubles. (#8252) intel/llvm-test-suite#1603 * [SYCL][PI][CUDA][HIP] Fix bugs that can cause events not to be waited on (#8374) Fixes two bug in CUDA PI and HIP PI that can cause waiting for events to do nothing: - The first one is an off-by-one error when checking if an event needs to be waited on - The second one is `last_sync_compute_streams_` / `last_sync_transfer_streams_` to a new value before checking the streams which can read these variables, expecting the old values. Both of these are synchronization related and therefore hard to test for. * [SYCL][Fusion] Do not internalize stored argument pointers (#8376) When a pointer to be promoted is stored, internalization is no longer safe to perform. In this case, simply bail out and do not promote the given pointer. Signed-off-by: Victor Perez <victor.perez@codeplay.com> Co-authored-by: Alexey Bader <alexey.bader@intel.com> * [SYCL] Avoid optimizing out integer conversion (#8409) This commit fixes and issue where an integer conversion happening inside an assert would cause the conversion to not happen when assertions were disabled. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com> * [SYCLomatic] Fix sycl to SYCLomatic pull down failed. Signed-off-by: Tang, Jiajun jiajun.tang@intel.com --------- Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com> Signed-off-by: Victor Perez <victor.perez@codeplay.com> Signed-off-by: Tang, Jiajun jiajun.tang@intel.com Co-authored-by: Alexey Bader <alexey.bader@intel.com> Co-authored-by: Chunyang Dai <chunyang.dai@intel.com> Co-authored-by: Wenju He <wenju.he@intel.com> Co-authored-by: Steffen Larsen <steffen.larsen@intel.com> Co-authored-by: Abhishek Bagusetty <59661409+abagusetty@users.noreply.github.com> Co-authored-by: Georgi Mirazchiyski <georgimweb@gmail.com> Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com> Co-authored-by: fineg74 <61437305+fineg74@users.noreply.github.com> Co-authored-by: Maksim Sabianin <maksim.sabianin@intel.com> Co-authored-by: Tadej Ciglarič <tadej.ciglaric@codeplay.com> Co-authored-by: vic <victor.perez@codeplay.com>
Hi @AuroraPerego. Can we close this issue now if there are no more problems to resolve? |
Hi, yes it's fine for me. |
While trying to compile my application with different nightlies at some point the compilation started to fail.
I found out that the reason is that
__CUDA_ARCH__
is defined, while it wasn't before and I was wondering whether this is wanted or not.How to reproduce:
clang++ -fsycl -std=c++17 dummy.cpp -dM -E -fsycl-targets=nvptx64-nvidia-cuda -fno-bundle-offload-arch --cuda-path=/usr/local/cuda -Wno-unknown-cuda-version -Wno-linker-warnings | grep CUDA_ARCH
dummy.cpp
is just:sycl-nightly/20221128
(clang version 16.0.0 (https://github.com/intel/llvm 85f2b97)) this command gives no outputsycl-nightly/20221129
(clang version 16.0.0 (https://github.com/intel/llvm 142d9f5)) the output is:#define __CUDA_ARCH__ 500
The commit that led to this change is this.
The text was updated successfully, but these errors were encountered: