Description
Anything you want to discuss about vllm.
Hi team!
Been trying to build vllm from source for ROCm 6.3 for gfx1100 on Arch/gcc14 following the instructions from the official documentation. Kept running into a compile error on the hipify step during the build:-
Excerpt from error -
...
In file included from <built-in>:1:
In file included from /opt/rocm/lib/llvm/lib/clang/18/include/__clang_hip_runtime_wrapper.h:145:
In file included from /opt/rocm/lib/llvm/lib/clang/18/include/cuda_wrappers/algorithm:55:
In file included from /usr/lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/algorithm:61:
/usr/lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/stl_algo.h:3626:7: error: reference to __host__ function '__glibcxx_assert_fail' in __host__ __device__ function
3626 | __glibcxx_assert(!(__hi < __lo));
| ^
/usr/lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/x86_64-pc-linux-gnu/bits/c++config.h:614:12: note: expanded from macro '__glibcxx_assert'
614 | std::__glibcxx_assert_fail(); \
| ^
/home/<username>/Documents/sources/vllm/build/temp.linux-x86_64-cpython-312/csrc/quantization/compressed_tensors/int8_quant_kernels.hip:35:14: note: called by 'float_to_int8_rn'
35 | dst = std::clamp(dst, i8_min, i8_max);
| ^
/home/<username>/Documents/sources/vllm/build/temp.linux-x86_64-cpython-312/csrc/quantization/compressed_tensors/int8_quant_kernels.hip:119:14: note: called by 'static_scaled_int8_quant_kernel<float, float>'
119 | out[i] = float_to_int8_rn(static_cast<float>(input[i]) / scale);
| ^
/usr/lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/x86_64-pc-linux-gnu/bits/c++config.h:608:3: note: '__glibcxx_assert_fail' declared here
608 | __glibcxx_assert_fail()
| ^
1 error generated when compiling for gfx1100.
...
On further investigations into why the error, it seems the std::clamp
function was the issue. For reasons, this seems to not work when compiling with gcc14/hip-clang.
A bit more looking into this and i found that this is a known issue at pytorch and LLVM projects, see: -
- Pytorch: Strange clamp assert error when building on Fedora 40/gcc 14 in IndexKernel.hip pytorch/pytorch#127666
- LLVM: [CUDA] std::clamp doesn't compile with latest clang and gcc llvm/llvm-project#95183
The fix/work-around Pytorch went with for this was replacing std::clamp
usage with similar logic (see commit)
I implemented that here and it went on and compiled successfully after sorting out all the offending files/places!
Will submit a PR with the changes soon ✌🏻
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.