Misc. bug: Vulkan performance regression on Iris Xe

### Name and Version

llama-cli b5050 vs b5017

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

llama-cli

### Command line

```shell
llama-cli -m "modelname.gguf" -p "prompt" -ngl 50
```

### Problem description & steps to reproduce

Under Windows 11, taking Mistral-Nemo-Instruct_2407 Q4_K_M as reference, performance went down from 4.7 tok/s with b5017 to 4.3 tok/s b5050 (same intel drivers 32.0.1016651)

For gemma-3-4b-it-Q6-K performance is the same in both builds at 9.7 tok/sec

While this may be anecdotal given Iris Xe is a rather basic integrated CPU, b5017 was the first version where I noticed llama.cpp Vulkan ran faster than AVX-2 on an i7 1165G7 (this did not use to be the case, Vulkan was quite more sluggish before), and running inference on the ris Xe is quite energy efficient : laptop can run inference at 100% GPU with fans off.

Reporting this in case the Vulkan wizards can take advantage of that data point!

With all the recent enhancements to vulkan in llama.cpp, it's now rather comfortable to run small-ish models on regular laptops.

### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Vulkan performance regression on Iris Xe #12754

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Vulkan performance regression on Iris Xe #12754

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions