-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm error: ggml-cuda.cu:6246: invalid device function #3320
Comments
Please post the full build command you used. And check if you can run |
I managed to trigger the same error when building with cmake the following way:
I also have the following environment variables set:
calling
(using a different model here since I only have this one at hand currently, sorry! If necessary I can re-test with a supported model later.) Surprisingly the error does not occur when I build it with make (
|
Can you try cmake without forcing CC/CXX ? If |
Building from cmake without forcing CC/CXX fails entirely since cmake is trying to build with my regular system gcc (which is why I assume this project's
Looking over the Makefile, it seems like it specifically compiles Lines 404 to 424 in c091cdf
However it does use |
I can confirm make LLAMA_HIPBLAS=1 and inference working without problem. Thanks staviq for your support |
Unfortunately some strange issue still exists: Instruction: write hello world in c \n### Assistant:Here is a simple C program that prints "Hello World":#include <stdio.h>
int main() {
printf("Hello, World!") ;
} This code includes the standard input and output library (stdio.h) which contains the basic building basic syntax for the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the 1999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 |
You should have |
|
Ok, that word puke at the end does come from the model, can you try with this one ? https://huggingface.co/SlyEcho/open_llama_3b_v2_gguf/tree/main |
Not sure output from the phind-codellama-34b-v2.Q4_K_M.gguf model is wrong since it works perfect on pure cpu.
This model generates simple hello world in c, but with more complicated query generated 265Mb log: |
Tested some queries inference with rocm 5.6 in new docker container instead of 5.7. phind-codellama-34b-v2.Q4_K_M.gguf seems working without errors on GPU. open-llama-3b-v2-q5_1.gguf seems fine also. Remaining llama.cpp issue is to patch CMakeLists.txt for using hipcc instead of clang++ for ggml-cuda.cu as it is done in simple make Note rocm5.7 have some bug making models output invalid in most complicated cases. |
The makefile calls |
Can confirm that for me at least the |
I have a similar experience with gfx1035. I can compile like this:
And run like this:
And it runs, apparently using the GPU according to rocm-smi. Now if I can just figure out how to offload more memory and compute to the GPU. It seems to use the VRAM for a scratch space? I can't set -ngl 1, I'll get an OOM for GPU. |
Running inference with
./main -ngl 1 -m llama-2-13b-chat.Q5_K_M.gguf -p "write hello"
an error appears:
Log start
main: build = 1268 (bedb92b)
main: built with AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.7.0 23352 d1e13c532a947d0cbfc94759c00dcf152294aa13) for x86_64-unknown-linux-gnu
main: seed = 1695481826
ggml_init_cublas: found 1 ROCm devices:
Device 0: , compute capability 11.0
... ...
CUDA error 98 at /code/llama.cpp/ggml-cuda.cu:6246: invalid device function
running with Ubuntu 22, rocm 5.7, AMD Radeon 7900 XTX in docker environment
with run command:
docker run -it --network=host --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size 24G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined
The text was updated successfully, but these errors were encountered: