Skip to content

Low performance with Sycl Backend #5480

Closed
@chsasank

Description

@chsasank

I am working on ollama/ollama#2458 and did some benchmarks to test the performance. I compiled with commit id 3bdc4cd0. Build segfaults with master as in #5469

I used mistral 7b int4 for M2 Air, Intel 12400 and Arc 770 16GB. I used llama-bench and mistral 7b model from here to find tok/s for prompt and text generation tok/s. My llama-bench command is

./build/bin/llama-bench -m models/mistral-7b-v0.1.Q4_0.gguf -p 128,256,512 -n 128,256,512

On M2 Air

model size params backend ngl test t/s
llama 7B Q4_0 3.83 GiB 7.24 B Metal 99 pp 128 144.47 ± 0.22
llama 7B Q4_0 3.83 GiB 7.24 B Metal 99 pp 256 142.95 ± 1.17
llama 7B Q4_0 3.83 GiB 7.24 B Metal 99 pp 512 141.36 ± 0.67
llama 7B Q4_0 3.83 GiB 7.24 B Metal 99 tg 128 20.06 ± 0.66
llama 7B Q4_0 3.83 GiB 7.24 B Metal 99 tg 256 20.26 ± 0.17
llama 7B Q4_0 3.83 GiB 7.24 B Metal 99 tg 512 13.96 ± 1.62

On Intel 12400 (compiled with sycl but made num-gpu-layers (ngl) = 0)

model size params backend ngl test t/s
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 0 pp 128 18.60 ± 3.07
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 0 pp 256 20.82 ± 0.14
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 0 pp 512 22.48 ± 0.16
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 0 tg 128 10.78 ± 0.02
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 0 tg 256 10.76 ± 0.02
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 0 tg 512 10.69 ± 0.01

On Arc 770

model size params backend ngl test t/s
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 99 pp 128 407.14 ± 58.05
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 99 pp 256 583.57 ± 78.24
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 99 pp 512 757.99 ± 1.48
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 99 tg 128 24.74 ± 0.27
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 99 tg 256 24.65 ± 0.20
llama 7B Q4_0 3.83 GiB 7.24 B SYCL 99 tg 512 21.46 ± 2.39

Good news is prompt processing time is somewhat high. Bade news is text generation on Arc GPUs is very low.

This is much slower than what I expected because Arc 770 is significantly faster than both M2 and 12400. You can see the benchmarks of FLOPs and BW here: https://github.com/chsasank/device-benchmarks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions