-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[User] nonsense responses with q2_k llama in Termux when using GPU #1909
Comments
I had such problem. 1 - check if you has newest build I had those 2 problems. |
Yes. [86c7571] and I tested with 2 different q2_k models. |
have you tried q4 model for test? |
could it be that -ngl 1 works only on apple silicon systems? |
Thanks for your response. I tested it now with open-llama-7B-open-instruct.ggmlv3.q4_0 and it's functional, working as expected. The issue is with q2_k models specifically. |
The ngl parameter functions with OpenCL through CLBlast even on my device: Android with Termux. |
Seems qk models are not fully supported on arm ( linux ? ) devices ... |
I'm downloading a 3_k_s model now, but I can't test until later tonight, so I'll let you know how it goes. It's an Android device with Termux.
|
Are you sure you q2_k model is not broken :P |
I cannot reproduce it on a PC using OpenCL. Here is what I get, looks perfectly reasonable:
Btw, using |
Hi, I don't see a reproduction in your message. Are you saying you're able to produce the nonsense with a q2_k model on PC?
Increasing -ngl # slows inference: #1718 |
Sorry, typo. I meant "cannot", not "can". |
Thanks for clarifying. I'm thinking it may be an ARM device specific issue, like mirek190 mentioned.
Yes, q2_k functions normal through CLBlast without offload. |
Small update, same results: Built ba4e85a including CLBlast using open-llama-13b-q2_k:
I didn't expect a change, but wanted to provide additional information. Per the results, even 13b q2_k produces nonsense. Thank you. |
Same issue. Using termux on sm8250 (snapdragon 870) with 8gb memory, built on latest commit on the master branch, getting gibberish output with offloading ( -ngl 1 to 35) with llama-7b.ggmlv3.q2_K.bin model. |
Thanks for your response. My device is 8GB RAM but there's also 8GB virtual RAM in the settings. Edit: to clarify, it's stock OS Android, no root. When loading greater than 8GB of RAM then it's quite slow, but yes it functions. The 13B q2_k Max Ram is 8.01 GB vs. 13b q4_0 at 9.82 GB which is significant when it comes to inference speed for a model that size on a device like mine. |
I noticed the same actually, I was using the Orange Pi 5B which ships with some custom Android and vendor OpenCL. |
Solved by recent pull.
llama-7b.ggmlv3.q2_K.bin
training a neural network is done in following steps:
1. Preparing the training data sets (Inputs - outputs)
2. Training the Neural Network
3. Testing the accuracy of Neural Network
Neural Network Algorithm:
The basic approach towards learning by Artificial Intelligence, the most
successful one up to now is called neural...
…On Tue, Jul 4, 2023, 3:27 PM Henri Vasserman ***@***.***> wrote:
I noticed the same actually, I was using the Orange Pi 5B which ships with
some custom Android and vendor OpenCL.
—
Reply to this email directly, view it on GitHub
<#1909 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD5KKLDDEDR5M27QDBEUFO3XOQAKFANCNFSM6AAAAAAZKHZYKE>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
|
I pulled today. Here's my result with -ins:
with --prompt:
Edit: Samantha model really highlights the error:
Edit 2: I've noticed that a prompt template significantly improves the quality of the response from 2_k models. Here's ./server Samantha:
The model starts with garble consistently, but it's definitely improved since posting. |
#2133 shows perplexity for GPU on Android is bugged. |
More on this: latest build, snapdragon 8 Gen 2, termux
Separately GPU off-loading, when it works,decreases performance. Probably memory bandwidth issue. |
Hello, were you able to fix this issue? |
More on this: recent koboldcpp build, snapdragon 8 Gen 1, termux. Any quant is garbled at GGUF model. k quant or not. Offloaded layers or not. Tryed with Mistral-7B GGUF and Marx-3B GGML. The problem occurs with CuBLAS or OpenBLAS, no difference. |
Do you see performance degradation in terms of speed on 8 gen 1 gpu as compared to running model on cpu |
By my tests the prompt processing is way faster, but the token generation is slower, indeed. I'm just using to process the prompt. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
./main -m ~/llama.cpp/models/samantha-1.1-llama-7b.ggmlv3.q2_K.bin --color -c 2048 --keep -1 -t 3 -b 7 -i -ins -ngl 1
runs, but produces nonsense responses. To clarify, without -ngl works as expected.I tested open-llama-7B-open-instruct.ggmlv3.q2_K and had the same result.
Environment and Context
Here's clinfo (native OpenCL);
lscpu;
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: Qualcomm
Model name: Kryo-4XX-Silver
Model: 14
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 0xd
CPU(s) scaling MHz: 62%
CPU max MHz: 1785.6000
CPU min MHz: 300.0000
BogoMIPS: 38.40
Flags: fp asimd evtstrm aes pmull
sha1 sha2 crc32 atomics f
php asimdhp cpuid asimdrdm
lrcpc dcpop asimddp
Model name: Kryo-4XX-Gold
Model: 14
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 2
Stepping: 0xd
CPU(s) scaling MHz: 71%
CPU max MHz: 2841.6001
CPU min MHz: 710.4000
BogoMIPS: 38.40
Flags: fp asimd evtstrm aes pmull
sha1 sha2 crc32 atomics f
php asimdhp cpuid asimdrdm
lrcpc dcpop asimddp
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Vulnerable
Spec store bypass: Vulnerable
Spectre v1: Mitigation; __user pointer
sanitization
Spectre v2: Mitigation; Branch predict
or hardening
Srbds: Not affected
Tsx async abort: Not affected
uname -a
Linux localhost 4.14.190-23725627-abG975WVLS8IWD1 #2 SMP PREEMPT Mon Apr 10 18:16:39 KST 2023 aarch64 Android
Steps to Reproduce
Thank you!
The text was updated successfully, but these errors were encountered: