deps: bump llama.cpp — RDNA3/RDNA4 MMQ tile override (+6–8% decode)#439
Open
DeanoC wants to merge 1 commit into
Open
deps: bump llama.cpp — RDNA3/RDNA4 MMQ tile override (+6–8% decode)#439DeanoC wants to merge 1 commit into
DeanoC wants to merge 1 commit into
Conversation
Bumps server/deps/llama.cpp (luce-dflash) to pick up the smaller 48x64/4-warp MMQ tile for DFlash spec-decode verify batches on consumer RDNA. Output is bit-identical; decode at --ddtree-budget=22, Qwen3.6-27B Q4_K_M: gfx1201 (R9700): 54.65 -> 59.37 tok/s (+8.3%) gfx1100 (RX 7900 XTX): 56.78 -> 60.18 tok/s (+6.0%) Depends on Luce-Org/llama.cpp-dflash-ggml#18; the submodule SHA should be repointed to the luce-dflash merge commit before this lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="server/deps/llama.cpp">
<violation number="1" location="server/deps/llama.cpp:1">
P1: Submodule `server/deps/llama.cpp` is pinned to a transient PR-head commit (`c5c9989d9fc4f2b1467979fb67b320eb808bab3d`) instead of a durable merge commit. This makes CI checkouts and fresh clones brittle if the upstream branch is force-pushed or rebased.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| @@ -1 +1 @@ | |||
| Subproject commit 574be6132bba97e864b16e3fd2fd4fcfaf52a742 | |||
| Subproject commit c5c9989d9fc4f2b1467979fb67b320eb808bab3d | |||
Contributor
There was a problem hiding this comment.
P1: Submodule server/deps/llama.cpp is pinned to a transient PR-head commit (c5c9989d9fc4f2b1467979fb67b320eb808bab3d) instead of a durable merge commit. This makes CI checkouts and fresh clones brittle if the upstream branch is force-pushed or rebased.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/deps/llama.cpp, line 1:
<comment>Submodule `server/deps/llama.cpp` is pinned to a transient PR-head commit (`c5c9989d9fc4f2b1467979fb67b320eb808bab3d`) instead of a durable merge commit. This makes CI checkouts and fresh clones brittle if the upstream branch is force-pushed or rebased.</comment>
<file context>
@@ -1 +1 @@
-Subproject commit 574be6132bba97e864b16e3fd2fd4fcfaf52a742
+Subproject commit c5c9989d9fc4f2b1467979fb67b320eb808bab3d
</file context>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps the
server/deps/llama.cppsubmodule (luce-dflash) to pick up the RDNA3/RDNA4 MMQ tile override — a48×64/4-warp tile for DFlash's small spec-decode verify batches in place of the stock128×128/8.Impact (Qwen3.6-27B Q4_K_M,
--ddtree-budget=22, 10-prompt HE mean, n_gen=256, output bit-identical)Dependency / draft status
luce-dflashmerge commit of Add OpenAI-compatible agent-ready server + Blackwell (sm_120/121) support #18.Benchmarks measured on the two GPUs above; AI-assisted (Claude Code).