How to use CUDA or BLAS #1070
Replies: 11 comments 21 replies
-
There are no pre-built binaries with cuBLAS at the moment, you have to build it yourself. |
Beta Was this translation helpful? Give feedback.
-
I did git pull with this version master-02d6988
Turn out in nvcc --help and on --gpu-architecture the allowed values doesn't have the 'native' value
This is my nvcc --version
first i tried test it with just ./main -m ./vicuna-7B-1.1-GPTQ-4bit-128g-GGML.bin it works but the output is gibberish |
Beta Was this translation helpful? Give feedback.
-
I did git pull with this version master( 02d6988)
Then I update the Makefile fix this issue
I'm not sure if this is a suitable way. |
Beta Was this translation helpful? Give feedback.
-
May be, add CUDA version binary to a release? |
Beta Was this translation helpful? Give feedback.
-
Trying to compile with BLAS support was very painful for me on Windows. I spent a few hours trying to make it work. I tried with the Intel MKL / OneApi version and with OpenBLAS. I could never get CMake to recognize my BLAS libraries no matter what I did. I eventually found this repository which provides a pre-compiled Llama.cpp with BLAS already enabled: |
Beta Was this translation helpful? Give feedback.
-
Okay, i spent several hours trying to make it work. So few ideas. Make sure your VS tools are those CUDA integrated to during install. The best solution would be to delete all VS and CUDA. Then delete any CMakeCache.txt that you could have. After that install VS and then Cuda and basically it should begin to work. But for me it didn't happen tho. I have BLAS = 1, but have no performance increase at all. My GPU isn't running any calcs too, i have veeeery short spike of activity but it doesn't affect eval time at all. No idea why. |
Beta Was this translation helpful? Give feedback.
-
For those who struggle with windows build:
|
Beta Was this translation helpful? Give feedback.
-
Do the binary releases now contain the cuBLAS code? It looks like it was all merged into releases starting last night: If so, is there a command line switch or environment variable to get the binary to notice cuBLAS? |
Beta Was this translation helpful? Give feedback.
-
I recommend you follow this article here: https://medium.com/@piyushbatra1999/installing-llama-cpp-python-with-nvidia-gpu-acceleration-on-windows-a-short-guide-0dfac475002d But change the terminal commands for reinstalling llama-cpp-python to the following, or else it does not work (at least for me it didn't):
|
Beta Was this translation helpful? Give feedback.
-
fucking stupiud CUDA is something else than CUBLAS.... why not -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=CUBLAS idiota cmake -B build -DGGML_CUDA=ON -DLLAMA_CURL=ON -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS Use GGML_CUDA instead Call Stack (most recent call first): -- Configuring incomplete, errors occurred! CMakeLists.txt:105 ==> at 04:00 AM !!! Idiota - why no
|
Beta Was this translation helpful? Give feedback.
-
I don't know, the author is teenager that he don't remember how guys do make and cmake? I'm 1987, that is 37 yrs old and why |
Beta Was this translation helpful? Give feedback.
-
With the master-8944a13 -
Add NVIDIA cuBLAS support (#1044) i looked forward if i can see any differences.
Sadly, i don't.
I cannot even see that my rtx 3060 is beeing used in any way at all by llama.cpp's main.exe on Windows, using the win-avx2 version.
Is there anything that needs to be switched on to use cuda?
The system-Info line of main.exe shows like this:
So why is BLAS=0 ?
Is there anything needed to use BLAS?
Beta Was this translation helpful? Give feedback.
All reactions