MoE Experts on AMD iGPU (Vulkan) while Layers & KV Cache on CUDA dGPU? #19006
Interpause
started this conversation in
Ideas
Replies: 1 comment
-
|
IIRC, iGPU memory is just system RAM and since it is specially allocated/isolated by the OS, it has more overhead to transfer/switch the stored experts weights to the dGPU |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Enlighten me if I am wrong. Would token generation speed after prompt processing increase if instead of offloading experts to CPU, experts could be offloaded to iGPU instead? (assuming there is a way to increase iGPU allocation)
Beta Was this translation helpful? Give feedback.
All reactions