Skip to content

[Bug] v0.5.0-Refinement model selection ignores user choice, always loads Qwen3 0.6B instead of selected 1.7B/4B #711

@longerwood

Description

@longerwood

📝 Issue标题

[Bug] Refinement model selection ignores user choice, always loads Qwen3 0.6B instead of selected 1.7B/4B


📋 Issue正文

Description

When I select Qwen3 1.7B (or 4B) as the refinement LLM in Voicebox settings, the backend still loads Qwen3 0.6B every time I trigger the "Speak in character" feature. The model selection in the UI appears to have no effect on the actual refinement model being used.

Environment

Item | Value -- | -- Voicebox Version | v0.5.0 OS | Windows 11 GPU | NVIDIA RTX A4000 (16GB VRAM) Installation Type | Windows installer (.exe)

Steps to Reproduce

  1. Open Voicebox Settings → Model Management

  2. Download Qwen3 1.7B (Refinement) model

  3. Select Qwen3 1.7B as the active refinement model

  4. Close settings, ensure "Speak in character" toggle is enabled

  5. Enter any text (e.g., "Hello, this is a test.") and click generate

  6. Check the backend logs

Expected Result

The backend should load Qwen3 1.7B on cuda... as the refinement LLM.

Actual Result

The backend consistently loads Qwen3 0.6B on cuda..., ignoring the user's selection.

Logs

text
2026-05-27 00:11:21,391 - backend.utils.hf_offline_patch - INFO - [offline-guard] qwen3-0.6b is cached �� forcing offline mode
2026-05-27 00:19:41,004 - backend.backends.qwen_llm_backend - INFO - Loading Qwen3 0.6B on cuda...

Full log excerpt showing download behavior:

text
model.safetensors:   0%|          | 0.00/1.50G [00:00<?, ?B/s]
model.safetensors: 100%|██████████| 1.50G/1.50G [00:09<00:00, 1.28MB/s]

Key observation: Even after deleting the 0.6B model cache, Voicebox automatically re-downloads the entire 1.5GB 0.6B model file on first use of "Speak in character," confirming that 0.6B is hardcoded as the refinement engine.

Additional Context

  • The TTS model selection works correctly (I can switch between Qwen3-TTS 0.6B/1.7B/4B without issue)

  • The refinement model management UI shows 1.7B as "downloaded" but the backend ignores it

  • This appears similar to a known bug from earlier versions (v0.1.13) where the opposite behavior occurred (0.6B selection loaded 1.7B)

Expected Fix

When a user selects a refinement model (0.6B, 1.7B, or 4B) in the settings, the qwen_llm_backend should respect that choice and load the selected model instead of hardcoding to 0.6B.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions