MoE Experts on AMD iGPU (Vulkan) while Layers & KV Cache on CUDA dGPU? #19006

Interpause · 2026-01-22T06:45:13Z

Interpause
Jan 22, 2026

Enlighten me if I am wrong. Would token generation speed after prompt processing increase if instead of offloading experts to CPU, experts could be offloaded to iGPU instead? (assuming there is a way to increase iGPU allocation)

pebaryan · 2026-01-22T13:43:37Z

pebaryan
Jan 22, 2026

IIRC, iGPU memory is just system RAM and since it is specially allocated/isolated by the OS, it has more overhead to transfer/switch the stored experts weights to the dGPU

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MoE Experts on AMD iGPU (Vulkan) while Layers & KV Cache on CUDA dGPU? #19006

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

MoE Experts on AMD iGPU (Vulkan) while Layers & KV Cache on CUDA dGPU? #19006

Uh oh!

Interpause Jan 22, 2026

Replies: 1 comment

Uh oh!

pebaryan Jan 22, 2026

Interpause
Jan 22, 2026

pebaryan
Jan 22, 2026