-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cuBLAS][DPCPP] Build issue with cuBLAS backend and DPCPP #257
Comments
@abagusetty Thanks for reporting this issue. |
@mmeterel Only with 11.x. |
Do you mean, the error don't appear with 10.2? And appears with 11.x (and 12.0 as shown in your logs)? |
Yes that is correct. Any version > 11.x including 12.0 only shows this issue. Had no problems with 10.2.89. This is only with cublas backend. No issues with rng domain(i.e, disabled cublas and enabled curand). I'm was sure if the above issue is related to simple cmake path or more involved. So tried to output some of the cmake path variables used by cublas cmakelists. |
Thanks for the information. Let me try to reproduce in on my side. |
Hi @mmeterel Just curious if you had a chance to reproduce the issue. Also was thinking if this has anything to do with how DPC++ is built (i.e., the bf16 support) etc. I was just using the standard CUDA-plugin instructions for DPC++ |
I was able to fix the above build issue when I've played with
|
@abagusetty Terribly sorry for not getting back to you earlier. :( I tried the below combination and did not see the build error you mentioned. oneMKL interfaces commit 45c43ed One difference I have is, I am using an old llvm commit (having difficulty in detecting GPU device with newer version). I will try a newer version when I figure out the issue. Also note that I did not specify 'CUDA_LIB_PATH'. I noticed you have this '-DENABLE_CUBLAS_BACKEND=True -DENABLE_CURAND_BACKEND=True'. Are you purposely trying to build both backends? AFAIK, the repo do not support that currently - although your problem is probably not due to this. Finally, I tried to compare my Cmake output with yours, but could not locate mine. How can I find that info? |
@mmeterel Thanks for taking time on this. I was trying to build both CUBLAS and CURAND since the TARGET_DOMAINS have the option to put a comma separated values. Was able to successfully build ROCBLAS and ROCRAND on the other hand without any issues. Regarding the cmake output, I just outputted the values of these variables (listed below) manually for debugging purposes. (i.e., edited FindCublas.cmake file) -- 2. CUDA_TOOLKIT_INCLUDE : /soft/compilers/cudatoolkit/cuda-12.0.0/include
-- 2. CUDA_cublas_LIBRARY : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/libcublas.so
-- 2. CUDA_LIBRARIES : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/libcudart_static.a;Threads::Threads;dl;/usr/lib64/librt.so
-- 2. CUDA_CUDART_LIBRARY : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/libcudart.so
-- 2. CUDA_CUDA_LIBRARY : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/stubs/libcuda.so |
Thanks for cmake tip. Here is the info I see: That does not help me much with your error frankly. Do you see anything? |
Unfortunately, the standard installation paths for CUDA mightn't not provide the right setup. Since all the issues that were reported in this thread and others were all non-standard CUDA installation paths. The above hack was on ALCF Polaris machine but I can try on NERSC Perlmutter to see if the CUDA_LIB_PATH setup is really the case of this build issue. If that is the case, a simple note in the documentation is all one might need. I can get back to you on this. Thanks again. |
Thanks for following up. Yeah, I am curious what is the problem here :) Also, I just noticed you are not building functional tests (-DBUILD_FUNCTIONAL_TESTS=False). In this case, you can build multiple backends. The issue will appear when you build the tests along with multiple backends. |
@abagusetty Any objections closing this issue? |
No, sorry. This can be closed. Thanks again. |
@npmiller Just reopened for tracking |
@abagusetty What you mean by "tracking"? Can you please give more detail? |
@mmeterel Sorry about the lack of context. @npmiller also noted some build issues with enabling multiple CU* backends with some similar issues using CUDA-12. I too get into the same issues with building multiple backends (with functional & examples disabled). The above trick never worked when I have CUSOLVER, CUBLAS and CURAND and it fails with the above ARCH issues defining bf16 headers. Only works, when I have CUBLAS-only enabled. |
There's a couple DPC++ tickets reporting similar issues: |
Hi @abagusetty We expect this issue to soon be resolved with the changes from PR #8257. I saw your comment on the PR - thanks a lot for verifying this! |
…YCL (#8257) This PR addresses an issue where if we use `__CUDA_ARCH__` causes intrinsics not to be defined in the CUDA include files. - Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device - Update the `sycl-macro.cpp` test to check the appropriate macro. --- As far as I could find the original issue was introduced from PR [#6524](7b47ebb) for enabling the bfloat16 support moving it from the experimental extension, and it breaks some codebases with CUDA interop calls. Current reports include github issues [#7722](#7722), [#8133](#8133) and [oneapi-src/oneMKL#257](oneapi-src/oneMKL#257). For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it instead for SYCL device targets and leave `__CUDA_ARCH__` as before for CUDA targets.
Co-authored-by: Wenju He <wenju.he@intel.com> Co-authored-by: Alexey Bader <alexey.bader@intel.com> * [CI] Use ubuntu-20.04 image Using ubuntu-latest still causes long delays due to missing runners. * [SYCL][NFC] Use reducer-access helper function instead of deduction guide (#8411) The use of deduction guides in the `ReducerAccess` helper class causes problems when building with a compiler that does not support them. This commit changes the implementation to use a helper function instead. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com> * [SYCL][CUDA] Support host-device memcpy2D (#8181) Addresses to support host-device memcpy2D copies * [SYCL][CUDA] Define __SYCL_CUDA_ARCH__ instead of __CUDA_ARCH__ for SYCL (#8257) This PR addresses an issue where if we use `__CUDA_ARCH__` causes intrinsics not to be defined in the CUDA include files. - Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device - Update the `sycl-macro.cpp` test to check the appropriate macro. --- As far as I could find the original issue was introduced from PR [#6524](intel/llvm@7b47ebb) for enabling the bfloat16 support moving it from the experimental extension, and it breaks some codebases with CUDA interop calls. Current reports include github issues [#7722](intel/llvm#7722), [#8133](intel/llvm#8133) and [oneapi-src/oneMKL#257](oneapi-src/oneMKL#257). For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it instead for SYCL device targets and leave `__CUDA_ARCH__` as before for CUDA targets. * [NFC][clang][SYCL] Refine the test check by adding `:` (#8416) The test can fail if wokring directory where the test was launched has a `error` substring in its path. * [ESIMD] Reduce number of bit-casts generated for lsc_block_load/store operations (#8385) * [SYCL] Add sub-group functions emulation for vector of doubles. (#8252) intel/llvm-test-suite#1603 * [SYCL][PI][CUDA][HIP] Fix bugs that can cause events not to be waited on (#8374) Fixes two bug in CUDA PI and HIP PI that can cause waiting for events to do nothing: - The first one is an off-by-one error when checking if an event needs to be waited on - The second one is `last_sync_compute_streams_` / `last_sync_transfer_streams_` to a new value before checking the streams which can read these variables, expecting the old values. Both of these are synchronization related and therefore hard to test for. * [SYCL][Fusion] Do not internalize stored argument pointers (#8376) When a pointer to be promoted is stored, internalization is no longer safe to perform. In this case, simply bail out and do not promote the given pointer. Signed-off-by: Victor Perez <victor.perez@codeplay.com> Co-authored-by: Alexey Bader <alexey.bader@intel.com> * [SYCL] Avoid optimizing out integer conversion (#8409) This commit fixes and issue where an integer conversion happening inside an assert would cause the conversion to not happen when assertions were disabled. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com> * [SYCLomatic] Fix sycl to SYCLomatic pull down failed. Signed-off-by: Tang, Jiajun jiajun.tang@intel.com --------- Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com> Signed-off-by: Victor Perez <victor.perez@codeplay.com> Signed-off-by: Tang, Jiajun jiajun.tang@intel.com Co-authored-by: Alexey Bader <alexey.bader@intel.com> Co-authored-by: Chunyang Dai <chunyang.dai@intel.com> Co-authored-by: Wenju He <wenju.he@intel.com> Co-authored-by: Steffen Larsen <steffen.larsen@intel.com> Co-authored-by: Abhishek Bagusetty <59661409+abagusetty@users.noreply.github.com> Co-authored-by: Georgi Mirazchiyski <georgimweb@gmail.com> Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com> Co-authored-by: fineg74 <61437305+fineg74@users.noreply.github.com> Co-authored-by: Maksim Sabianin <maksim.sabianin@intel.com> Co-authored-by: Tadej Ciglarič <tadej.ciglaric@codeplay.com> Co-authored-by: vic <victor.perez@codeplay.com>
Closing as it was address in intel/llvm#8257 |
Summary
Build fails with CUBLAS backend and DPC++ with the following errors. However the build is just fine with CURAND backend. Any pointers how to navigate this.
cuda_fp16.hpp:690:1: error: unknown type name 'CUDA_FP16_DECL'
Debug info from CMake output: (cmake/FindcuBLAS.cmake)
Version
oneMKL: d217915
llvm/dpcpp: b2d6fdfd63fe8dd7be98e113cde9bc7c3d9a21d8
CUDA: 12.0
Build instructions
Error
The text was updated successfully, but these errors were encountered: