Skip to content

Excessively slow prompt processing time with 70B partially offloaded in SYCL #5272

Closed
@Jacoby1218

Description

@Jacoby1218

prompt processing is extremely slow with a 70B partially offloaded.
llama-bench.exe -ngl 20 -m "D:\models\lzlv_70b_fp16_hf.Q4_K_M.gguf"
Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device

model size params backend ngl test t/s
llama 70B Q4_K - Medium 38.58 GiB 68.98 B SYCL 20 pp 512 2.14 ± 0.28
llama 70B Q4_K - Medium 38.58 GiB 68.98 B SYCL 20 tg 128 1.03 ± 0.01

build: a28c5ef (2045)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions