The runtime  Vulkan llama.cpp (Windows)  v1.51.0 use CPU incorrectly

**Which version of LM Studio?**
LM Studio 0.3.27

**Which operating system?**
Windows 11

**What is the bug?**
I'm using AMD AI Max + 395, 16G was assigned as VRAM, running qwen3-30b-a3b through the runtime Vulkan llama.cpp. The older version 1.50.2 was pretty fast, nearly 50 tokens/s, but the testing version 1.51.0 is slow, it's only about 20 tokens/s. I open the Task Manager, the version 1.51.0 would load the model to RAM and computed by CPU, when 1.50.2 load the model to vram, and computed by GPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The runtime Vulkan llama.cpp (Windows) v1.51.0 use CPU incorrectly #1041

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The runtime Vulkan llama.cpp (Windows) v1.51.0 use CPU incorrectly #1041

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions