deps: bump llama.cpp — RDNA3/RDNA4 MMQ tile override (+6–8% decode) by DeanoC · Pull Request #439 · Luce-Org/lucebox-hub

DeanoC · 2026-06-23T09:40:07Z

Bumps the server/deps/llama.cpp submodule (luce-dflash) to pick up the RDNA3/RDNA4 MMQ tile override — a 48×64/4-warp tile for DFlash's small spec-decode verify batches in place of the stock 128×128/8.

Impact (Qwen3.6-27B Q4_K_M, `--ddtree-budget=22`, 10-prompt HE mean, n_gen=256, output bit-identical)

GPU	before	after	gain
gfx1201 (R9700)	54.65	59.37	+8.3%
gfx1100 (RX 7900 XTX)	56.78	60.18	+6.0%

Dependency / draft status

Depends on ggml-cuda: smaller MMQ tile for RDNA3/RDNA4 spec-decode batches (+6–8%) llama.cpp-dflash-ggml#18 (the kernel change). This PR currently points the submodule at that PR's head commit; before merging, repoint the SHA to the luce-dflash merge commit of Add OpenAI-compatible agent-ready server + Blackwell (sm_120/121) support #18.
Draft until Add OpenAI-compatible agent-ready server + Blackwell (sm_120/121) support #18 merges. CI submodule checkout may not resolve the PR-head SHA until then.
gfx1151/RDNA3.5 is intentionally left on the default (not benchmarked).

Benchmarks measured on the two GPUs above; AI-assisted (Claude Code).

Bumps server/deps/llama.cpp (luce-dflash) to pick up the smaller 48x64/4-warp MMQ tile for DFlash spec-decode verify batches on consumer RDNA. Output is bit-identical; decode at --ddtree-budget=22, Qwen3.6-27B Q4_K_M: gfx1201 (R9700): 54.65 -> 59.37 tok/s (+8.3%) gfx1100 (RX 7900 XTX): 56.78 -> 60.18 tok/s (+6.0%) Depends on Luce-Org/llama.cpp-dflash-ggml#18; the submodule SHA should be repointed to the luce-dflash merge commit before this lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cubic-dev-ai

1 issue found across 2 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="server/deps/llama.cpp">

<violation number="1" location="server/deps/llama.cpp:1">
P1: Submodule `server/deps/llama.cpp` is pinned to a transient PR-head commit (`c5c9989d9fc4f2b1467979fb67b320eb808bab3d`) instead of a durable merge commit. This makes CI checkouts and fresh clones brittle if the upstream branch is force-pushed or rebased.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-06-23T09:43:40Z

@@ -1 +1 @@
-Subproject commit 574be6132bba97e864b16e3fd2fd4fcfaf52a742
+Subproject commit c5c9989d9fc4f2b1467979fb67b320eb808bab3d


P1: Submodule server/deps/llama.cpp is pinned to a transient PR-head commit (c5c9989d9fc4f2b1467979fb67b320eb808bab3d) instead of a durable merge commit. This makes CI checkouts and fresh clones brittle if the upstream branch is force-pushed or rebased.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At server/deps/llama.cpp, line 1: <comment>Submodule `server/deps/llama.cpp` is pinned to a transient PR-head commit (`c5c9989d9fc4f2b1467979fb67b320eb808bab3d`) instead of a durable merge commit. This makes CI checkouts and fresh clones brittle if the upstream branch is force-pushed or rebased.</comment> <file context> @@ -1 +1 @@ -Subproject commit 574be6132bba97e864b16e3fd2fd4fcfaf52a742 +Subproject commit c5c9989d9fc4f2b1467979fb67b320eb808bab3d </file context>

DeanoC marked this pull request as ready for review June 23, 2026 09:41

cubic-dev-ai Bot reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps: bump llama.cpp — RDNA3/RDNA4 MMQ tile override (+6–8% decode)#439

deps: bump llama.cpp — RDNA3/RDNA4 MMQ tile override (+6–8% decode)#439
DeanoC wants to merge 1 commit into
Luce-Org:mainfrom
GeometricAGI:feat/rdna-mmq-tile-submodule-bump

DeanoC commented Jun 23, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -1 +1 @@
		Subproject commit 574be6132bba97e864b16e3fd2fd4fcfaf52a742
		Subproject commit c5c9989d9fc4f2b1467979fb67b320eb808bab3d

Conversation

DeanoC commented Jun 23, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Impact (Qwen3.6-27B Q4_K_M, --ddtree-budget=22, 10-prompt HE mean, n_gen=256, output bit-identical)

Dependency / draft status

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DeanoC commented Jun 23, 2026 •

edited by cubic-dev-ai Bot

Loading

Impact (Qwen3.6-27B Q4_K_M, `--ddtree-budget=22`, 10-prompt HE mean, n_gen=256, output bit-identical)