Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuBLAS][DPCPP] Build issue with cuBLAS backend and DPCPP #257

Closed
abagusetty opened this issue Dec 22, 2022 · 20 comments
Closed

[cuBLAS][DPCPP] Build issue with cuBLAS backend and DPCPP #257

abagusetty opened this issue Dec 22, 2022 · 20 comments
Assignees

Comments

@abagusetty
Copy link

Summary

Build fails with CUBLAS backend and DPC++ with the following errors. However the build is just fine with CURAND backend. Any pointers how to navigate this.

cuda_fp16.hpp:690:1: error: unknown type name 'CUDA_FP16_DECL'

Debug info from CMake output: (cmake/FindcuBLAS.cmake)

-- Found CUDA: /soft/compilers/cudatoolkit/cuda-12.0.0 (found suitable version "12.0", minimum required is "10.0") 
-- Found cuBLAS: /soft/compilers/cudatoolkit/cuda-12.0.0/include  
-- 2. CUDA_TOOLKIT_INCLUDE : /soft/compilers/cudatoolkit/cuda-12.0.0/include
-- 2. CUDA_cublas_LIBRARY  : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/libcublas.so
-- 2. CUDA_LIBRARIES  : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/libcudart_static.a;Threads::Threads;dl;/usr/lib64/librt.so
-- 2. CUDA_CUDART_LIBRARY  : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/libcudart.so
-- 2. CUDA_CUDA_LIBRARY  : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/stubs/libcuda.so

Version

oneMKL: d217915
llvm/dpcpp: b2d6fdfd63fe8dd7be98e113cde9bc7c3d9a21d8
CUDA: 12.0

Build instructions

CUDA_LIB_PATH=/soft/compilers/cudatoolkit/cuda-12.0.0/lib64/stubs cmake .. -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DENABLE_CUBLAS_BACKEND=True -DENABLE_CURAND_BACKEND=True -DENABLE_ROCBLAS_BACKEND=False -DENABLE_MKLCPU_BACKEND=False -DENABLE_MKLGPU_BACKEND=False -DBUILD_FUNCTIONAL_TESTS=False -DBUILD_EXAMPLES=False

Error

[ 26%] Building CXX object bin/blas/backends/cublas/CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_level1.cpp.o
cd /lus/eagle/projects/UINTAH_aesp/abagusetty/oneMKL/build/bin/blas/backends/cublas && /lus/eagle/projects/UINTAH_aesp/abagusetty/llvm_sycl/build_PrngEnvGnu_cuda1200_12212022/install/bin/clang++ -DCUDA_NO_HALF -I/lus/eagle/projects/UINTAH_aesp/abagusetty/oneMKL/include -I/lus/eagle/projects/UINTAH_aesp/abagusetty/oneMKL/src/include -I/lus/eagle/projects/UINTAH_aesp/abagusetty/oneMKL/src -isystem /lus/eagle/projects/UINTAH_aesp/abagusetty/llvm_sycl/build_PrngEnvGnu_cuda1200_12212022/install/include/sycl -isystem /soft/compilers/cudatoolkit/cuda-12.0.0/include -DSYCL2020_DISABLE_DEPRECATION_WARNINGS -O3 -DNDEBUG -fPIC -fsycl -fsycl-targets=nvptx64-nvidia-cuda -fsycl-unnamed-lambda -MD -MT bin/blas/backends/cublas/CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_level1.cpp.o -MF CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_level1.cpp.o.d -o CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_level1.cpp.o -c /lus/eagle/projects/UINTAH_aesp/abagusetty/oneMKL/src/blas/backends/cublas/cublas_level1.cpp
clang-16: warning: CUDA version is newer than the latest partially supported version 11.8 [-Wunknown-cuda-version]
In file included from /lus/eagle/projects/UINTAH_aesp/abagusetty/oneMKL/src/blas/backends/cublas/cublas_level1.cpp:20:
In file included from /lus/eagle/projects/UINTAH_aesp/abagusetty/oneMKL/src/blas/backends/cublas/cublas_task.hpp:13:
In file included from /lus/eagle/projects/UINTAH_aesp/abagusetty/oneMKL/src/blas/backends/cublas/cublas_scope_handle.hpp:27:
/lus/eagle/projects/UINTAH_aesp/abagusetty/llvm_sycl/build_PrngEnvGnu_cuda1200_12212022/install/bin/../include/sycl/backend/cuda.hpp:17:2: warning: sycl/backend/cuda.hpp is deprecated and not required anymore [-W#warnings]
#warning sycl/backend/cuda.hpp is deprecated and not required anymore
 ^
1 warning generated.
In file included from /lus/eagle/projects/UINTAH_aesp/abagusetty/oneMKL/src/blas/backends/cublas/cublas_level1.cpp:19:
In file included from /lus/eagle/projects/UINTAH_aesp/abagusetty/oneMKL/src/blas/backends/cublas/cublas_helper.hpp:31:
In file included from /soft/compilers/cudatoolkit/cuda-12.0.0/include/cublas_v2.h:69:
In file included from /soft/compilers/cudatoolkit/cuda-12.0.0/include/cublas_api.h:77:
In file included from /soft/compilers/cudatoolkit/cuda-12.0.0/include/cuda_fp16.h:4006:
/soft/compilers/cudatoolkit/cuda-12.0.0/include/cuda_fp16.hpp:690:1: error: unknown type name '__CUDA_FP16_DECL__'
__CUDA_FP16_DECL__ __half2 __internal_device_float2_to_half2_rn(const float a, const float b) {
^
/soft/compilers/cudatoolkit/cuda-12.0.0/include/cuda_fp16.hpp:690:27: error: expected ';' after top level declarator
__CUDA_FP16_DECL__ __half2 __internal_device_float2_to_half2_rn(const float a, const float b) {
                          ^
@abagusetty abagusetty changed the title Build issue with cuBLAS backend and DPCPP [cuBLAS][DPCPP] Build issue with cuBLAS backend and DPCPP Dec 22, 2022
@mmeterel
Copy link
Contributor

mmeterel commented Jan 9, 2023

@abagusetty Thanks for reporting this issue.
Do you know if the same problem occurs with CUDA versions 10.2 or 11.x?

@abagusetty
Copy link
Author

@mmeterel Only with 11.x.

@mmeterel
Copy link
Contributor

mmeterel commented Jan 9, 2023

Do you mean, the error don't appear with 10.2? And appears with 11.x (and 12.0 as shown in your logs)?

@abagusetty
Copy link
Author

Yes that is correct. Any version > 11.x including 12.0 only shows this issue. Had no problems with 10.2.89. This is only with cublas backend. No issues with rng domain(i.e, disabled cublas and enabled curand).

I'm was sure if the above issue is related to simple cmake path or more involved. So tried to output some of the cmake path variables used by cublas cmakelists.

@mmeterel
Copy link
Contributor

mmeterel commented Jan 9, 2023

Yes that is correct. Any version > 11.x including 12.0 only shows this issue. Had no problems with 10.2.89. This is only with cublas backend. No issues with rng domain(i.e, disabled cublas and enabled curand).

I'm was sure if the above issue is related to simple cmake path or more involved. So tried to output some of the cmake path variables used by cublas cmakelists.

Thanks for the information. Let me try to reproduce in on my side.

@mmeterel mmeterel self-assigned this Jan 9, 2023
@abagusetty
Copy link
Author

Hi @mmeterel Just curious if you had a chance to reproduce the issue. Also was thinking if this has anything to do with how DPC++ is built (i.e., the bf16 support) etc. I was just using the standard CUDA-plugin instructions for DPC++

@abagusetty
Copy link
Author

I was able to fix the above build issue when I've played with CUDA_LIB_PATH stubs location. Not sure if that makes sense.

CUDA_LIB_PATH=/soft/compilers/cudatoolkit/cuda-12.0.0/lib64/stubs to CUDA_LIB_PATH=/soft/compilers/cudatoolkit/cuda-12.0.0/targets/x86_64-linux/lib/stubs

@mmeterel
Copy link
Contributor

@abagusetty Terribly sorry for not getting back to you earlier. :(

I tried the below combination and did not see the build error you mentioned.

oneMKL interfaces commit 45c43ed
Machine: A100
CUDA version: cuda_12.0.r12.0

One difference I have is, I am using an old llvm commit (having difficulty in detecting GPU device with newer version). I will try a newer version when I figure out the issue.

Also note that I did not specify 'CUDA_LIB_PATH'.

I noticed you have this '-DENABLE_CUBLAS_BACKEND=True -DENABLE_CURAND_BACKEND=True'. Are you purposely trying to build both backends? AFAIK, the repo do not support that currently - although your problem is probably not due to this.

Finally, I tried to compare my Cmake output with yours, but could not locate mine. How can I find that info?

@abagusetty
Copy link
Author

@mmeterel Thanks for taking time on this. I was trying to build both CUBLAS and CURAND since the TARGET_DOMAINS have the option to put a comma separated values.

Was able to successfully build ROCBLAS and ROCRAND on the other hand without any issues. Regarding the cmake output, I just outputted the values of these variables (listed below) manually for debugging purposes. (i.e., edited FindCublas.cmake file)

-- 2. CUDA_TOOLKIT_INCLUDE : /soft/compilers/cudatoolkit/cuda-12.0.0/include
-- 2. CUDA_cublas_LIBRARY  : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/libcublas.so
-- 2. CUDA_LIBRARIES  : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/libcudart_static.a;Threads::Threads;dl;/usr/lib64/librt.so
-- 2. CUDA_CUDART_LIBRARY  : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/libcudart.so
-- 2. CUDA_CUDA_LIBRARY  : /soft/compilers/cudatoolkit/cuda-12.0.0/lib64/stubs/libcuda.so

@mmeterel
Copy link
Contributor

Thanks for cmake tip. Here is the info I see:
-- CUDA_TOOLKIT_INCLUDE: /usr/local/cuda/include
-- CUDA_cublas_LIBRARY: /usr/local/cuda/lib64/libcublas.so
-- CUDA_LIBRARIES: /usr/local/cuda/lib64/libcudart_static.a;dl;/usr/lib/x86_64-linux-gnu/librt.a
-- CUDA_CUDART_LIBRARY: /usr/local/cuda/lib64/libcudart.so
-- CUDA_CUDA_LIBRARY: /usr/lib/x86_64-linux-gnu/libcuda.so

That does not help me much with your error frankly. Do you see anything?

@abagusetty
Copy link
Author

Unfortunately, the standard installation paths for CUDA mightn't not provide the right setup. Since all the issues that were reported in this thread and others were all non-standard CUDA installation paths.

The above hack was on ALCF Polaris machine but I can try on NERSC Perlmutter to see if the CUDA_LIB_PATH setup is really the case of this build issue. If that is the case, a simple note in the documentation is all one might need. I can get back to you on this. Thanks again.

@mmeterel
Copy link
Contributor

Thanks for following up. Yeah, I am curious what is the problem here :)

Also, I just noticed you are not building functional tests (-DBUILD_FUNCTIONAL_TESTS=False). In this case, you can build multiple backends. The issue will appear when you build the tests along with multiple backends.

@mmeterel
Copy link
Contributor

mmeterel commented Feb 1, 2023

@abagusetty Any objections closing this issue?

@abagusetty
Copy link
Author

No, sorry. This can be closed. Thanks again.

@abagusetty abagusetty reopened this Feb 7, 2023
@abagusetty
Copy link
Author

@npmiller Just reopened for tracking

@mmeterel
Copy link
Contributor

mmeterel commented Feb 7, 2023

@npmiller Just reopened for tracking

@abagusetty What you mean by "tracking"? Can you please give more detail?

@abagusetty
Copy link
Author

@mmeterel Sorry about the lack of context. @npmiller also noted some build issues with enabling multiple CU* backends with some similar issues using CUDA-12. I too get into the same issues with building multiple backends (with functional & examples disabled). The above trick never worked when I have CUSOLVER, CUBLAS and CURAND and it fails with the above ARCH issues defining bf16 headers.

Only works, when I have CUBLAS-only enabled.

@GeorgeWeb
Copy link

Hi @abagusetty

We expect this issue to soon be resolved with the changes from PR #8257.

I saw your comment on the PR - thanks a lot for verifying this!

steffenlarsen pushed a commit to intel/llvm that referenced this issue Feb 21, 2023
…YCL (#8257)

This PR addresses an issue where if we use `__CUDA_ARCH__` causes
intrinsics not to be defined in the CUDA include files.
- Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device
- Update the `sycl-macro.cpp` test to check the appropriate macro.

---

As far as I could find the original issue was introduced from PR
[#6524](7b47ebb)
for enabling the bfloat16 support moving it from the experimental
extension, and it breaks some codebases with CUDA interop calls.
Current reports include github issues
[#7722](#7722),
[#8133](#8133) and
[oneapi-src/oneMKL#257](oneapi-src/oneMKL#257).

For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it
instead for SYCL device targets and leave `__CUDA_ARCH__` as before for
CUDA targets.
zhimingwang36 added a commit to oneapi-src/SYCLomatic that referenced this issue Feb 23, 2023
Co-authored-by: Wenju He <wenju.he@intel.com>
Co-authored-by: Alexey Bader <alexey.bader@intel.com>

* [CI] Use ubuntu-20.04 image

Using ubuntu-latest still causes long delays due to missing runners.

* [SYCL][NFC] Use reducer-access helper function instead of deduction guide (#8411)

The use of deduction guides in the `ReducerAccess` helper class causes
problems when building with a compiler that does not support them. This
commit changes the implementation to use a helper function instead.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

* [SYCL][CUDA] Support host-device memcpy2D  (#8181)

Addresses to support host-device memcpy2D copies

* [SYCL][CUDA] Define __SYCL_CUDA_ARCH__ instead of __CUDA_ARCH__ for SYCL (#8257)

This PR addresses an issue where if we use `__CUDA_ARCH__` causes
intrinsics not to be defined in the CUDA include files.
- Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device
- Update the `sycl-macro.cpp` test to check the appropriate macro.

---

As far as I could find the original issue was introduced from PR
[#6524](intel/llvm@7b47ebb)
for enabling the bfloat16 support moving it from the experimental
extension, and it breaks some codebases with CUDA interop calls.
Current reports include github issues
[#7722](intel/llvm#7722),
[#8133](intel/llvm#8133) and
[oneapi-src/oneMKL#257](oneapi-src/oneMKL#257).

For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it
instead for SYCL device targets and leave `__CUDA_ARCH__` as before for
CUDA targets.

* [NFC][clang][SYCL] Refine the test check by adding `:` (#8416)

The test can fail if wokring directory where the test was launched has a
`error` substring in its path.

* [ESIMD] Reduce number of bit-casts generated for lsc_block_load/store operations (#8385)

* [SYCL] Add sub-group functions emulation for vector of doubles. (#8252)

intel/llvm-test-suite#1603

* [SYCL][PI][CUDA][HIP] Fix bugs that can cause events not to be waited on (#8374)

Fixes two bug in CUDA PI and HIP PI that can cause waiting for events to
do nothing:
- The first one is an off-by-one error when checking if an event needs
to be waited on
- The second one is `last_sync_compute_streams_` /
`last_sync_transfer_streams_` to a new value before checking the streams
which can read these variables, expecting the old values.

Both of these are synchronization related and therefore hard to test
for.

* [SYCL][Fusion] Do not internalize stored argument pointers (#8376)

When a pointer to be promoted is stored, internalization is no longer
safe to perform. In this case, simply bail out and do not promote the
given pointer.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Co-authored-by: Alexey Bader <alexey.bader@intel.com>

* [SYCL] Avoid optimizing out integer conversion (#8409)

This commit fixes and issue where an integer conversion happening inside
an assert would cause the conversion to not happen when assertions were
disabled.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

* [SYCLomatic] Fix sycl to SYCLomatic pull down failed.

Signed-off-by: Tang, Jiajun jiajun.tang@intel.com

---------

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Signed-off-by: Tang, Jiajun jiajun.tang@intel.com
Co-authored-by: Alexey Bader <alexey.bader@intel.com>
Co-authored-by: Chunyang Dai <chunyang.dai@intel.com>
Co-authored-by: Wenju He <wenju.he@intel.com>
Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>
Co-authored-by: Abhishek Bagusetty <59661409+abagusetty@users.noreply.github.com>
Co-authored-by: Georgi Mirazchiyski <georgimweb@gmail.com>
Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
Co-authored-by: fineg74 <61437305+fineg74@users.noreply.github.com>
Co-authored-by: Maksim Sabianin <maksim.sabianin@intel.com>
Co-authored-by: Tadej Ciglarič <tadej.ciglaric@codeplay.com>
Co-authored-by: vic <victor.perez@codeplay.com>
@abagusetty
Copy link
Author

Closing as it was address in intel/llvm#8257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants