-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Which version of LM Studio?
LM Studio 0.3.27
Which operating system?
I use arch btw
What is the bug?
Vulkan v1.50.2 uses VRAM to load models as expected. When loading gpt-oss 120b, I can clearly see that 60G of VRAM gets allocated, plus I can see that the GPU load is ~50% when the text is being generated (use btop).
Vulkan v1.52.0 doesn't use VRAM, it uses RAM + swap to load the model, and then it uses the CPU for compute. VRAM usage is at 1/96GB, GPU usage is at 1-2%.
This can be reproduced consistently just by switching the engine to 1.50.2 and back, you can check VRAM usage with btop, radeontop or amdgpu_top --smi.
I guess this bug is also present in v1.51.0 as per #1041
To Reproduce
Steps to reproduce the behavior:
- Set the engine to Vulkan v1.50.2, load any model, make sure VRAM is used (commands above), generate some text and remember TPS.
- Set the engine to Vulkan v1.52.0, load the same model, re-generate the response. Observe x3 worse performance, see that VRAM is not used and that the CPU load is higher than before.
TLDR fix
Set the engine to Vulkan v1.50.2