feat(kllama-cli): swap Llama GGUF + SafeTensors branches to DSL path by michalharakal · Pull Request #122 · SKaiNET-developers/SKaiNET-transformers

michalharakal · 2026-05-04T17:54:46Z

Companion to #121 — swaps the kllama CLI's remaining Llama-family branches off LlamaRuntime and onto the DSL path. After this merge all GGUF and SafeTensors paths in the kllama CLI run through the DSL.

Changes

Llama / Mistral GGUF: DecoderGgufWeightLoader(NATIVE_OPTIMIZED, LLAMA_COMPATIBLE_ARCHITECTURES) → DecoderGgufMemSegConverter.convert → LlamaNetworkLoader.fromWeights → OptimizedLLMRuntime DIRECT mode. Same packed Q4_0/Q8_0 SIMD path the Qwen swap uses.
Llama SafeTensors: DecoderSafeTensorsLoader<FP32>(ctx, FP32::class, metadata, tiedEmbeddings).loadToMap { … } → LlamaNetworkLoader.fromWeights → OptimizedLLMRuntime. Drops the legacy LlamaIngestion SafeTensors path.
BIN (Karpathy llama2.c format): kept on legacy LlamaRuntime for now. The .bin loader returns LlamaRuntimeWeights directly and the DSL path requires DecoderGgufWeights. Either migrate Llama2DotCWeightLoader or drop .bin support — separate followup.

What's still on legacy after this PR

BIN format in this CLI (above).
KLlamaJava (Java facade) and KLlamaSession.
LlamaIngestionBlocking.
:llm-apps:skainet-cli/Main.kt.
:llm-runtime:kqwen/QwenIngestion.kt.
:llm-performance benchmark engines (JVM + native).
Wasm/native kllama browser/cli Main.kt.

Each is a focused migration PR; deletion of LlamaRuntime / LlamaIngestion / MemSegWeightConverter / CpuAttentionBackend family comes after they're all migrated.

Why it's safe

Numerical parity with LlamaRuntime is pinned by QwenDslLegacyParityTest (#120, closes #114). Same LlamaNetworkLoader.fromWeights codepath, just exercised via Qwen — Llama produces equivalent output by the same construction. Q8 round-trip equivalence is pinned by QwenDslQuantizedTest (#113).

Imports cleaned

Removed LlamaIngestion, LlamaLoadConfig, MemSegWeightConverter, LlamaWeightMapper — all unused after the swap. LlamaRuntime + CpuAttentionBackend stay for the BIN fallback.

Test plan

:llm-runtime:kllama:jvmTest, :llm-core:jvmTest, :llm-inference:qwen:jvmTest, :llm-inference:llama:jvmTest — all pass.
Compile clean (only one pre-existing @Deprecated warning on the LlamaRuntime BIN-fallback ctor).
CI green on PR.
Manual (post-merge): kllama-cli with a real Llama / Mistral GGUF, plus a SafeTensors checkpoint; verify coherent output.

🤖 Generated with Claude Code

Mirrors the Qwen swap from #121 for the Llama / Mistral GGUF branch and the Llama SafeTensors branch: - **GGUF**: `DecoderGgufWeightLoader(NATIVE_OPTIMIZED, LLAMA_COMPATIBLE_ARCHITECTURES)` → `DecoderGgufMemSegConverter.convert` → `LlamaNetworkLoader.fromWeights` → `OptimizedLLMRuntime` DIRECT mode. Same packed Q4_0/Q8_0 SIMD path the Qwen swap uses; no behavior change for quantized models. - **SafeTensors**: `DecoderSafeTensorsLoader<FP32>(...).loadToMap` → `LlamaNetworkLoader.fromWeights` → `OptimizedLLMRuntime`. Drops the legacy `LlamaIngestion` SafeTensors path entirely. - **BIN** (Karpathy llama2.c format): kept on legacy `LlamaRuntime` for now. The .bin loader returns `LlamaRuntimeWeights` directly, and the DSL path requires `DecoderGgufWeights`. Either migrate `Llama2DotCWeightLoader` or drop .bin support — separate followup. After this merge the kllama CLI's Llama / Qwen / Mistral GGUF + Llama SafeTensors paths all run through the DSL. Only BIN format and a handful of other consumers (`KLlamaJava`, `:llm-apps:skainet-cli`, `:llm-performance` benchmark engines) still depend on `LlamaRuntime` / `LlamaIngestion` / `MemSegWeightConverter` / `CpuAttentionBackend`. Those migrations + deletion of the legacy stack are subsequent PRs. Imports cleaned: removed `LlamaIngestion`, `LlamaLoadConfig`, `MemSegWeightConverter`, `LlamaWeightMapper` — all unused after the swap. `LlamaRuntime` and `CpuAttentionBackend` stay (BIN path). Numerical parity with the legacy LlamaRuntime path on identical weights is pinned by `QwenDslLegacyParityTest` (#120) — same `LlamaNetworkLoader.fromWeights` codepath, just exercised via Qwen. The Llama branch produces equivalent output by the same construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

michalharakal merged commit 7905c97 into develop May 4, 2026
2 checks passed

michalharakal deleted the feat/llama-dsl-cli-swap branch May 5, 2026 08:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kllama-cli): swap Llama GGUF + SafeTensors branches to DSL path#122

feat(kllama-cli): swap Llama GGUF + SafeTensors branches to DSL path#122
michalharakal merged 1 commit into
developfrom
feat/llama-dsl-cli-swap

michalharakal commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented May 4, 2026

Changes

What's still on legacy after this PR

Why it's safe

Imports cleaned

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant