Closed
Description
prompt processing is extremely slow with a 70B partially offloaded.
llama-bench.exe -ngl 20 -m "D:\models\lzlv_70b_fp16_hf.Q4_K_M.gguf"
Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device
model | size | params | backend | ngl | test | t/s |
---|---|---|---|---|---|---|
llama 70B Q4_K - Medium | 38.58 GiB | 68.98 B | SYCL | 20 | pp 512 | 2.14 ± 0.28 |
llama 70B Q4_K - Medium | 38.58 GiB | 68.98 B | SYCL | 20 | tg 128 | 1.03 ± 0.01 |
build: a28c5ef (2045)