metal : use F32 prec for K*Q in vec FA #9595

ggerganov · 2024-09-22T19:21:00Z

Noticed that Qwen2-7B-Instruct with Metal FA produces garbage:

./llama-cli -m ./models/qwen2-7b-instruct/ggml-model-f16.gguf -p "I believe the meaning of life is" -n 16 -s 1 --temp 0.0 -fa

master

I believe the meaning of life is tobedtls_x509_crt_new_from_pem() create a new

PR

I believe the meaning of life is to be happy and to be healthy. I believe that the best way to achieve

Seems to be again insufficient precision in the K*Q multiplication, this time only in the vec FA kernel (it works OK with the non-vec kernel). Should make K*Q always use F32 precision - proves to be very sensitive operation.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggml-ci

metal : use F32 prec for K*Q in vec FA

5d888c4

ggml-ci

ggerganov merged commit bf9c101 into master Sep 23, 2024
55 checks passed

ggerganov deleted the gg/metal-fa-f32-qk branch September 23, 2024 08:27

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

metal : use F32 prec for K*Q in vec FA (ggerganov#9595)

f103238

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal : use F32 prec for K*Q in vec FA #9595

metal : use F32 prec for K*Q in vec FA #9595

ggerganov commented Sep 22, 2024

metal : use F32 prec for K*Q in vec FA #9595

metal : use F32 prec for K*Q in vec FA #9595

Conversation

ggerganov commented Sep 22, 2024

master

PR