feat(llm-core): shared decoder transformer body + DecoderModelMetadata by michalharakal · Pull Request #109 · SKaiNET-developers/SKaiNET-transformers

michalharakal · 2026-05-04T09:33:43Z

Summary

Adds an architecture-neutral decoder-only transformer body builder (decoderTransformerNetwork) and a DecoderModelMetadata interface to :llm-core. Every per-model xNetwork(metadata) function can now compose it with that model's specific knobs (RoPE base, RMSNorm eps, QK-norm) instead of duplicating the transformer DAG.
Today qwenNetwork() is a 3-line stub that delegates to llamaNetwork(), and llamaNetwork() hardcodes eps = 1e-5f, qkNorm = false, RoPE base = 10_000 — none of which are right for Qwen3. This PR is the foundation for collapsing both *NetworkDef.kt files into thin callers of the shared builder; that collapse and Llama-named loader rename ship as separate follow-up PRs per the no-model-duplication plan.

Scope

Purely additive — three new files in :llm-core; no existing source modified.
DecoderModelMetadata.kt (interface) — common shape fields + ropeFreqBase, rmsNormEps, BOS/EOS that every decoder LLM in this repo carries.
DecoderTransformerNetwork.kt (builder) — Embedding → N × (RMSNorm → MHA(RoPE, KVCache, [QK-norm]) → Residual → RMSNorm → SwiGLU FFN → Residual) → RMSNorm → output Dense. Knobs (ropeBase, eps, qkNorm) default from metadata so callers can override per-architecture.
DecoderTransformerNetworkTest.kt — module-tree-shape tests with a synthetic in-test DecoderModelMetadata impl. The full integration with real LlamaModelMetadata lives in the follow-up *NetworkDef collapse PR (which adds : DecoderModelMetadata to that data class).

Test plan

./gradlew :llm-core:compileKotlinJvm :llm-core:compileCommonMainKotlinMetadata — clean compile, only pre-existing warnings.
./gradlew :llm-core:jvmTest — 9 suites / 85 tests pass, including the 3 new ones in DecoderTransformerNetworkTest.
CI green on PR.

Refs the closed #46. No behavior change for downstream consumers — nothing imports the new code yet.

🤖 Generated with Claude Code

…lMetadata Adds an architecture-neutral decoder-only transformer body builder that each per-model `xNetwork(metadata)` can compose with its own knobs (RoPE base, RMSNorm eps, QK-norm) instead of duplicating the transformer DAG. Today `qwenNetwork()` is a 3-line stub delegating to `llamaNetwork()`, and `llamaNetwork()` hardcodes `eps = 1e-5f`, `qkNorm = false`, and the default RoPE base of 10000 — none of which are right for Qwen3. The intended fix is to collapse both `*NetworkDef.kt` files into thin callers of the shared `decoderTransformerNetwork` introduced here, with each passing its architecture-specific knobs explicitly. That collapse ships in the next PR; this PR is purely additive in :llm-core. The `DecoderModelMetadata` interface captures the shape fields plus the common defaults (`ropeFreqBase`, `rmsNormEps`, BOS/EOS) that every decoder LLM in this repo carries. `LlamaModelMetadata` adopts it in the follow-up so it can be passed directly. Tests verify: - module tree shape (token_embd / blk.N / output_norm / output) honors the layer count - `qkNorm` flips real `q_norm` / `k_norm` submodules into the MHA tree - `ropeBase` and `eps` defaults flow from metadata without code change Refs the no-model-duplication architectural plan (see ~/.claude/plans/snazzy-wibbling-dewdrop.md), and the closed #46. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

michalharakal merged commit 862c1c5 into develop May 4, 2026
2 checks passed

michalharakal deleted the feat/decoder-body-shared branch May 4, 2026 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm-core): shared decoder transformer body + DecoderModelMetadata#109

feat(llm-core): shared decoder transformer body + DecoderModelMetadata#109
michalharakal merged 1 commit into
developfrom
feat/decoder-body-shared

michalharakal commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented May 4, 2026

Summary

Scope

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant