refactor: collapse Llama/Qwen/Voxtral NetworkDef onto shared decoder body by michalharakal · Pull Request #110 · SKaiNET-developers/SKaiNET-transformers

michalharakal · 2026-05-04T09:43:47Z

Summary

Each architecture's xNetwork(metadata) is now a thin (~5 line) caller of decoderTransformerNetwork from feat(llm-core): shared decoder transformer body + DecoderModelMetadata #109, instead of duplicating the transformer DAG or stub-delegating across modules.
qwenNetwork() is no longer a stub — it now actually applies QK-norm and metadata-driven RoPE base / eps, the architectural difference from Llama that issue Wire QwenNetworkLoader into CLI for proper Qwen3 inference #46 originally called out.
voxtralBackboneNetwork no longer imports llamaNetwork from :llm-inference:llama, removing the last cross-model import flagged by the no-model-duplication plan.
Net diff: -33 lines, no behavior change for existing consumers.

Changes per file

LlamaModelMetadata: now implements DecoderModelMetadata. Override-only diff; field names were already aligned.
llamaNetwork: collapsed from ~50-line sequential{} block to a thin decoderTransformerNetwork(..., qkNorm = false) call. Same module tree as before; eps and ropeBase now flow through metadata defaults.
qwenNetwork: was a 3-line stub delegating to llamaNetwork; now a real function exposing qkNorm: Boolean = true. Per-architecture knobs (ropeFreqBase, rmsNormEps) propagate via metadata.
QwenNetworkLoader: auto-detects QK-norm from *.attn_q_norm.weight presence in loaded weights — same pattern Gemma uses. Real Qwen3 GGUFs always carry these tensors; synthetic test fixtures don't, so both keep working.
voxtralBackboneNetwork: drops the cross-model import; calls decoderTransformerNetwork directly.

Intentionally NOT touched

Gemma 4 — legitimate divergence (geGluFFN, sandwich norms, layer output scale, KV sharing, PLE hooks, custom GemmaModel wrapper).
Apertus — legitimate divergence (xIELU activation, ungated FFN with separate dense up/down).
Voxtral acoustic — 3-layer flow-matching transformer with no embedding/output projection, different beast.
BERT — encoder-only, bidirectional, LayerNorm not RMSNorm, GELU FFN.

Test plan

./gradlew :llm-core:jvmTest :llm-inference:llama:jvmTest :llm-inference:qwen:jvmTest :llm-inference:voxtral:jvmTest
./gradlew :llm-inference:apertus:jvmTest :llm-inference:gemma:jvmTest :llm-inference:bert:jvmTest
./gradlew :llm-runtime:kllama:jvmTest :llm-runtime:kqwen:jvmTest :llm-api:jvmTest :llm-agent:jvmTest
All pass.
CI green on PR.

DSL-vs-LlamaRuntime numerical parity test deferred to a focused follow-up PR with careful tolerance choice. Refs the closed #46.

🤖 Generated with Claude Code

…body Each architecture's `xNetwork(metadata)` is now a thin (~5 line) caller of the shared `decoderTransformerNetwork` builder added in #109, instead of either duplicating the transformer DAG or stub-delegating across modules. Wire-level changes: - LlamaModelMetadata implements DecoderModelMetadata (override-only diff; field names already aligned). - llamaNetwork: collapsed from a ~50-line sequential{} block to a thin decoderTransformerNetwork call with qkNorm = false. Same module tree as before, eps and ropeBase now flow from metadata defaults instead of being hardcoded. - qwenNetwork: was a 3-line stub delegating to llamaNetwork; now a real function passing qkNorm through from the loader. Qwen3-specific knobs (ropeFreqBase, rmsNormEps) propagate via the shared metadata defaults. - QwenNetworkLoader auto-detects QK-norm from `*.attn_q_norm.weight` presence in the loaded weights — same pattern Gemma uses (GemmaNetworkLoader.fromWeights). Real Qwen3 GGUFs always carry these tensors, synthetic test fixtures don't, so both keep working. - voxtralBackboneNetwork: dropped the cross-model import of llamaNetwork; now calls decoderTransformerNetwork directly. No model module imports another model's network builder. Architectures intentionally NOT touched: - Gemma 4: legitimate divergence (geGluFFN, sandwich norms, layer output scale, KV sharing, PLE hooks, custom GemmaModel wrapper). - Apertus: legitimate divergence (xIELU activation, ungated FFN with separate up/down dense + xielu, no swiGluFFN). - Voxtral acoustic: 3-layer flow-matching transformer, no embedding / output projection, custom name prefix — different beast. - BERT: encoder-only, bidirectional, LayerNorm not RMSNorm, GELU FFN. Tests pass across :llm-core, :llm-inference:{llama,qwen,voxtral,apertus, gemma,bert}, :llm-runtime:{kllama,kqwen}, :llm-api, :llm-agent. DSL-vs-LlamaRuntime numerical parity test deferred to a follow-up PR. Refs the no-model-duplication plan and the closed #46. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

michalharakal mentioned this pull request May 4, 2026

refactor: rename Llama* generic loaders to Decoder* #111

Merged

2 tasks

michalharakal merged commit 594d489 into develop May 4, 2026
2 checks passed

michalharakal deleted the refactor/collapse-network-def branch May 4, 2026 10:05

This was referenced May 4, 2026

feat(qwen): DSL native-quantized GGUF entry point + Q8 smoke test #113

Merged

Phase 4 readiness: DSL Qwen and legacy LlamaRuntime diverge numerically on identical weights #114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: collapse Llama/Qwen/Voxtral NetworkDef onto shared decoder body#110

refactor: collapse Llama/Qwen/Voxtral NetworkDef onto shared decoder body#110
michalharakal merged 1 commit into
developfrom
refactor/collapse-network-def

michalharakal commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented May 4, 2026

Summary

Changes per file

Intentionally NOT touched

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant