Skip to content

deps: bump llama.cpp — RDNA3/RDNA4 MMQ tile override (+6–8% decode)#439

Open
DeanoC wants to merge 1 commit into
Luce-Org:mainfrom
GeometricAGI:feat/rdna-mmq-tile-submodule-bump
Open

deps: bump llama.cpp — RDNA3/RDNA4 MMQ tile override (+6–8% decode)#439
DeanoC wants to merge 1 commit into
Luce-Org:mainfrom
GeometricAGI:feat/rdna-mmq-tile-submodule-bump

Conversation

@DeanoC

@DeanoC DeanoC commented Jun 23, 2026

Copy link
Copy Markdown

Bumps the server/deps/llama.cpp submodule (luce-dflash) to pick up the RDNA3/RDNA4 MMQ tile override — a 48×64/4-warp tile for DFlash's small spec-decode verify batches in place of the stock 128×128/8.

Impact (Qwen3.6-27B Q4_K_M, --ddtree-budget=22, 10-prompt HE mean, n_gen=256, output bit-identical)

GPU before after gain
gfx1201 (R9700) 54.65 59.37 +8.3%
gfx1100 (RX 7900 XTX) 56.78 60.18 +6.0%

Dependency / draft status

Benchmarks measured on the two GPUs above; AI-assisted (Claude Code).

Review in cubic

Bumps server/deps/llama.cpp (luce-dflash) to pick up the smaller 48x64/4-warp
MMQ tile for DFlash spec-decode verify batches on consumer RDNA. Output is
bit-identical; decode at --ddtree-budget=22, Qwen3.6-27B Q4_K_M:
  gfx1201 (R9700):       54.65 -> 59.37 tok/s (+8.3%)
  gfx1100 (RX 7900 XTX): 56.78 -> 60.18 tok/s (+6.0%)

Depends on Luce-Org/llama.cpp-dflash-ggml#18; the submodule SHA should be
repointed to the luce-dflash merge commit before this lands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@DeanoC DeanoC marked this pull request as ready for review June 23, 2026 09:41

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="server/deps/llama.cpp">

<violation number="1" location="server/deps/llama.cpp:1">
P1: Submodule `server/deps/llama.cpp` is pinned to a transient PR-head commit (`c5c9989d9fc4f2b1467979fb67b320eb808bab3d`) instead of a durable merge commit. This makes CI checkouts and fresh clones brittle if the upstream branch is force-pushed or rebased.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread server/deps/llama.cpp
@@ -1 +1 @@
Subproject commit 574be6132bba97e864b16e3fd2fd4fcfaf52a742
Subproject commit c5c9989d9fc4f2b1467979fb67b320eb808bab3d

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Submodule server/deps/llama.cpp is pinned to a transient PR-head commit (c5c9989d9fc4f2b1467979fb67b320eb808bab3d) instead of a durable merge commit. This makes CI checkouts and fresh clones brittle if the upstream branch is force-pushed or rebased.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/deps/llama.cpp, line 1:

<comment>Submodule `server/deps/llama.cpp` is pinned to a transient PR-head commit (`c5c9989d9fc4f2b1467979fb67b320eb808bab3d`) instead of a durable merge commit. This makes CI checkouts and fresh clones brittle if the upstream branch is force-pushed or rebased.</comment>

<file context>
@@ -1 +1 @@
-Subproject commit 574be6132bba97e864b16e3fd2fd4fcfaf52a742
+Subproject commit c5c9989d9fc4f2b1467979fb67b320eb808bab3d
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant