Skip to content

Misc. bug: Vulkan performance regression on Iris Xe #12754

Closed
@EricGrange

Description

@EricGrange

Name and Version

llama-cli b5050 vs b5017

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

llama-cli -m "modelname.gguf" -p "prompt" -ngl 50

Problem description & steps to reproduce

Under Windows 11, taking Mistral-Nemo-Instruct_2407 Q4_K_M as reference, performance went down from 4.7 tok/s with b5017 to 4.3 tok/s b5050 (same intel drivers 32.0.1016651)

For gemma-3-4b-it-Q6-K performance is the same in both builds at 9.7 tok/sec

While this may be anecdotal given Iris Xe is a rather basic integrated CPU, b5017 was the first version where I noticed llama.cpp Vulkan ran faster than AVX-2 on an i7 1165G7 (this did not use to be the case, Vulkan was quite more sluggish before), and running inference on the ris Xe is quite energy efficient : laptop can run inference at 100% GPU with fans off.

Reporting this in case the Vulkan wizards can take advantage of that data point!

With all the recent enhancements to vulkan in llama.cpp, it's now rather comfortable to run small-ish models on regular laptops.

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions