Skip to content

feat(llm-core): shared decoder transformer body + DecoderModelMetadata#109

Merged
michalharakal merged 1 commit into
developfrom
feat/decoder-body-shared
May 4, 2026
Merged

feat(llm-core): shared decoder transformer body + DecoderModelMetadata#109
michalharakal merged 1 commit into
developfrom
feat/decoder-body-shared

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Summary

  • Adds an architecture-neutral decoder-only transformer body builder (decoderTransformerNetwork) and a DecoderModelMetadata interface to :llm-core. Every per-model xNetwork(metadata) function can now compose it with that model's specific knobs (RoPE base, RMSNorm eps, QK-norm) instead of duplicating the transformer DAG.
  • Today qwenNetwork() is a 3-line stub that delegates to llamaNetwork(), and llamaNetwork() hardcodes eps = 1e-5f, qkNorm = false, RoPE base = 10_000 — none of which are right for Qwen3. This PR is the foundation for collapsing both *NetworkDef.kt files into thin callers of the shared builder; that collapse and Llama-named loader rename ship as separate follow-up PRs per the no-model-duplication plan.

Scope

  • Purely additive — three new files in :llm-core; no existing source modified.
  • DecoderModelMetadata.kt (interface) — common shape fields + ropeFreqBase, rmsNormEps, BOS/EOS that every decoder LLM in this repo carries.
  • DecoderTransformerNetwork.kt (builder) — Embedding → N × (RMSNorm → MHA(RoPE, KVCache, [QK-norm]) → Residual → RMSNorm → SwiGLU FFN → Residual) → RMSNorm → output Dense. Knobs (ropeBase, eps, qkNorm) default from metadata so callers can override per-architecture.
  • DecoderTransformerNetworkTest.kt — module-tree-shape tests with a synthetic in-test DecoderModelMetadata impl. The full integration with real LlamaModelMetadata lives in the follow-up *NetworkDef collapse PR (which adds : DecoderModelMetadata to that data class).

Test plan

  • ./gradlew :llm-core:compileKotlinJvm :llm-core:compileCommonMainKotlinMetadata — clean compile, only pre-existing warnings.
  • ./gradlew :llm-core:jvmTest — 9 suites / 85 tests pass, including the 3 new ones in DecoderTransformerNetworkTest.
  • CI green on PR.

Refs the closed #46. No behavior change for downstream consumers — nothing imports the new code yet.

🤖 Generated with Claude Code

…lMetadata

Adds an architecture-neutral decoder-only transformer body builder that
each per-model `xNetwork(metadata)` can compose with its own knobs
(RoPE base, RMSNorm eps, QK-norm) instead of duplicating the transformer
DAG.

Today `qwenNetwork()` is a 3-line stub delegating to `llamaNetwork()`,
and `llamaNetwork()` hardcodes `eps = 1e-5f`, `qkNorm = false`, and the
default RoPE base of 10000 — none of which are right for Qwen3. The
intended fix is to collapse both `*NetworkDef.kt` files into thin
callers of the shared `decoderTransformerNetwork` introduced here, with
each passing its architecture-specific knobs explicitly. That collapse
ships in the next PR; this PR is purely additive in :llm-core.

The `DecoderModelMetadata` interface captures the shape fields plus the
common defaults (`ropeFreqBase`, `rmsNormEps`, BOS/EOS) that every
decoder LLM in this repo carries. `LlamaModelMetadata` adopts it in the
follow-up so it can be passed directly.

Tests verify:
- module tree shape (token_embd / blk.N / output_norm / output) honors
  the layer count
- `qkNorm` flips real `q_norm` / `k_norm` submodules into the MHA tree
- `ropeBase` and `eps` defaults flow from metadata without code change

Refs the no-model-duplication architectural plan (see
~/.claude/plans/snazzy-wibbling-dewdrop.md), and the closed #46.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire QwenNetworkLoader into CLI for proper Qwen3 inference

1 participant