Multi GPU with Vulkan out of memory issue.

Running llama.cpp #5832 (9731134296af3a6839cd682e51d9c2109a871de5)

I'm trying to load a model on two GPUs with Vulkan.

My GPUs have 20 and 11 gigs of VRAM

Loading a Q6_K quant of size `26.27 GiB (6.56 BPW)` with `-ts "20,11" -c 512` yields:
```
ggml ctx size =    0.62 MiB
offloading 60 repeating layers to GPU
offloading non-repeating layers to GPU
offloaded 61/61 layers to GPU
   Vulkan0 buffer size = 17458.44 MiB
   Vulkan1 buffer size =  9088.14 MiB
       CPU buffer size =   358.90 MiB

Vulkan0 KV buffer size =    80.00 MiB
Vulkan1 KV buffer size =    40.00 MiB

KV self size  =  120.00 MiB, K (f16):   60.00 MiB, V (f16):   60.00 MiB
Vulkan_Host input buffer size   =    16.01 MiB
   Vulkan0 compute buffer size =   113.00 MiB
   Vulkan1 compute buffer size =   139.00 MiB
Vulkan_Host compute buffer size =    14.00 MiB

ggml_vulkan: Device memory allocation of size 120422400 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
```
The math doesn't seem to add up.

A Q5_K_M quant at `22.65 GiB (5.66 BPW)` works perfectly fine until I increase the context to 4096.

This can't possibly be context, right? When using HIP on smaller models, I have to push it much harder to OOM, I should be fine with 31GB of VRAM.
Any idea why this happens?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi GPU with Vulkan out of memory issue. #5848

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi GPU with Vulkan out of memory issue. #5848

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions