📝 Issue标题
[Bug] Refinement model selection ignores user choice, always loads Qwen3 0.6B instead of selected 1.7B/4B
📋 Issue正文
Description
When I select Qwen3 1.7B (or 4B) as the refinement LLM in Voicebox settings, the backend still loads Qwen3 0.6B
every time I trigger the "Speak in character" feature. The model
selection in the UI appears to have no effect on the actual refinement
model being used.
Environment
Item | Value
-- | --
Voicebox Version | v0.5.0
OS | Windows 11
GPU | NVIDIA RTX A4000 (16GB VRAM)
Installation Type | Windows installer (.exe)
Steps to Reproduce
Open Voicebox Settings → Model Management
Download Qwen3 1.7B (Refinement) model
Select Qwen3 1.7B as the active refinement model
Close settings, ensure "Speak in character" toggle is enabled
Enter any text (e.g., "Hello, this is a test.") and click generate
Check the backend logs
Expected Result
The backend should load Qwen3 1.7B on cuda... as the refinement LLM.
Actual Result
The backend consistently loads Qwen3 0.6B on cuda..., ignoring the user's selection.
Logs
2026-05-27 00:11:21,391 - backend.utils.hf_offline_patch - INFO - [offline-guard] qwen3-0.6b is cached �� forcing offline mode
2026-05-27 00:19:41,004 - backend.backends.qwen_llm_backend - INFO - Loading Qwen3 0.6B on cuda...
Full log excerpt showing download behavior:
model.safetensors: 0%| | 0.00/1.50G [00:00<?, ?B/s]
model.safetensors: 100%|██████████| 1.50G/1.50G [00:09<00:00, 1.28MB/s]
Key observation:
Even after deleting the 0.6B model cache, Voicebox automatically
re-downloads the entire 1.5GB 0.6B model file on first use of "Speak in
character," confirming that 0.6B is hardcoded as the refinement engine.
Additional Context
The TTS model selection works correctly (I can switch between Qwen3-TTS 0.6B/1.7B/4B without issue)
The refinement model management UI shows 1.7B as "downloaded" but the backend ignores it
This
appears similar to a known bug from earlier versions (v0.1.13) where
the opposite behavior occurred (0.6B selection loaded 1.7B)
Expected Fix
When a user selects a refinement model (0.6B, 1.7B, or 4B) in the settings, the qwen_llm_backend should respect that choice and load the selected model instead of hardcoding to 0.6B.
📝 Issue标题
[Bug] Refinement model selection ignores user choice, always loads Qwen3 0.6B instead of selected 1.7B/4B
📋 Issue正文
Description
When I select Qwen3 1.7B (or 4B) as the refinement LLM in Voicebox settings, the backend still loads Qwen3 0.6B every time I trigger the "Speak in character" feature. The model selection in the UI appears to have no effect on the actual refinement model being used.
Environment
Steps to Reproduce
Open Voicebox Settings → Model Management
Download Qwen3 1.7B (Refinement) model
Select Qwen3 1.7B as the active refinement model
Close settings, ensure "Speak in character" toggle is enabled
Enter any text (e.g., "Hello, this is a test.") and click generate
Check the backend logs
Expected Result
The backend should load
Qwen3 1.7B on cuda...as the refinement LLM.Actual Result
The backend consistently loads
Qwen3 0.6B on cuda..., ignoring the user's selection.Logs
Full log excerpt showing download behavior:
Key observation: Even after deleting the 0.6B model cache, Voicebox automatically re-downloads the entire 1.5GB 0.6B model file on first use of "Speak in character," confirming that 0.6B is hardcoded as the refinement engine.
Additional Context
The TTS model selection works correctly (I can switch between Qwen3-TTS 0.6B/1.7B/4B without issue)
The refinement model management UI shows 1.7B as "downloaded" but the backend ignores it
This appears similar to a known bug from earlier versions (v0.1.13) where the opposite behavior occurred (0.6B selection loaded 1.7B)
Expected Fix
When a user selects a refinement model (0.6B, 1.7B, or 4B) in the settings, the
qwen_llm_backendshould respect that choice and load the selected model instead of hardcoding to 0.6B.