Skip to content

Wire QwenNetworkLoader into CLI for proper Qwen3 inference #46

@michalharakal

Description

@michalharakal

Context

The CLI (Main.kt) always routes GGUF models through LlamaIngestionLlamaRuntime, which works for Llama-architecture models. Qwen3 models load successfully (same tensor names), but produce garbage output because LlamaRuntime doesn't handle Qwen3-specific features:

  • QK-norm (query/key normalization via attn_q_norm.weight / attn_k_norm.weight)
  • RoPE base frequency (1,000,000 vs Llama's 10,000)
  • BOS token differences

The correct loader (QwenNetworkLoader in llm-inference:qwen) exists but isn't wired into the CLI.

Scope

  • Add :llm-inference:qwen dependency to :llm-runtime:kllama
  • Detect qwen* architecture from GGUF metadata in Main.kt
  • Route to QwenNetworkLoader.fromGguf() for Qwen models
  • Wire the Qwen DSL network module into a runtime compatible with AgentLoop
  • Validate end-to-end with Qwen3-1.7B-Q8_0.gguf --demo

Related

Acceptance Criteria

  • Qwen3-1.7B-Q8_0.gguf --demo produces coherent output
  • Tool calling works through the Qwen chat template
  • Llama models continue working unchanged

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions