-
Notifications
You must be signed in to change notification settings - Fork 14.3k
vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron #18295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ggerganov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack on the ggml-backend changes
|
From my side it looks fine, but the Vulkan Mac CI is reporting an issue. Can you look into that? |
Also handle GGML_OP_SCALE at the end (nemotron, deepseek2). Fewer pipeline variants and spec constants, just use push constants. In test_topk_moe, change exp_probs_b to be 1D, matching real networks. Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified.
4a17402 to
75bcc84
Compare
|
I don't know why the mac system is failing. It's in test cases that should be fused so I don't think it's a fluke, but I can't reproduce it locally on NVIDIA or lavapipe and I can't find anything from code inspection. While trying I did find that sometimes there are ties that lead to spurious failures, so I've updated the tests to avoid that. I doubt this is related to the mac failures. If it still fails in CI I'll probably need to just disable this fusion for moltenvk. |
bfbd40e to
03b18c9
Compare
03b18c9 to
86df563
Compare
|
I tried a couple experiments through CI, but don't have a workaround for the moltenvk failures. I've disabled the new fusion for moltenvk. |
Also handle GGML_OP_SCALE at the end (nemotron, deepseek2).
Fewer pipeline variants and spec constants, just use push constants.
In test_topk_moe, change exp_probs_b to be 1D, matching real networks.
Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified.