[Bug] v0.5.0-Refinement model selection ignores user choice, always loads Qwen3 0.6B instead of selected 1.7B/4B

<html><body>
<h2><span class="">📝 Issue标题</span></h2><p class="ds-markdown-paragraph"><strong><span class="">[Bug] Refinement model selection ignores user choice, always loads Qwen3 0.6B instead of selected 1.7B/4B</span></strong></p><hr><h2><span class="">📋 Issue正文</span></h2><h3><span class="">Description</span></h3><p class="ds-markdown-paragraph"><span class="">When I select </span><strong><span class="">Qwen3 1.7B</span></strong><span class=""> (or 4B) as the refinement LLM in Voicebox settings, the backend still loads </span><strong><span class="">Qwen3 0.6B</span></strong><span class="">
 every time I trigger the "Speak in character" feature. The model 
selection in the UI appears to have no effect on the actual refinement 
model being used.</span></p><h3><span class="">Environment</span></h3><div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9"><div class="ds-scroll-area__gutters" style="--container-height: 229px; position: sticky; top: 0px; left: 0px; right: 0px; width: 100%; height: 0px;"><div class="ds-scroll-area__horizontal-gutter" style="left: 0px; right: 0px; display: block; top: calc(var(--container-height) - 14px); height: 10px;"></div><div class="ds-scroll-area__vertical-gutter" style="right: 0px; top: 8px; bottom: calc(0px - var(--container-height) + 8px); width: 10px;"></div></div>
Item | Value
-- | --
Voicebox Version | v0.5.0
OS | Windows 11
GPU | NVIDIA RTX A4000 (16GB VRAM)
Installation Type | Windows installer (.exe)

</div><h3><span class="">Steps to Reproduce</span></h3><ol start="1"><li><p class="ds-markdown-paragraph"><span class="">Open Voicebox Settings → Model Management</span></p></li><li><p class="ds-markdown-paragraph"><span class="">Download </span><strong><span class="">Qwen3 1.7B (Refinement)</span></strong><span class=""> model</span></p></li><li><p class="ds-markdown-paragraph"><span class="">Select </span><strong><span class="">Qwen3 1.7B</span></strong><span class=""> as the active refinement model</span></p></li><li><p class="ds-markdown-paragraph"><span class="">Close settings, ensure "Speak in character" toggle is enabled</span></p></li><li><p class="ds-markdown-paragraph"><span class="">Enter any text (e.g., "Hello, this is a test.") and click generate</span></p></li><li><p class="ds-markdown-paragraph"><span class="">Check the backend logs</span></p></li></ol><h3><span class="">Expected Result</span></h3><p class="ds-markdown-paragraph"><span class="">The backend should load </span><code>Qwen3 1.7B on cuda...</code><span class=""> as the refinement LLM.</span></p><h3><span class="">Actual Result</span></h3><p class="ds-markdown-paragraph"><span class="">The backend consistently loads </span><code>Qwen3 0.6B on cuda...</code><span class="">, ignoring the user's selection.</span></p><h3><span class="">Logs</span></h3><div class="md-code-block md-code-block-light"><div class="md-code-block-banner-wrap"><div class="md-code-block-banner md-code-block-banner-lite"><div class="_121d384"><div class="d2a24f03"><span class="d813de27">text</span></div></div></div></div><pre><span>2026-05-27 00:11:21,391 - backend.utils.hf_offline_patch - INFO - [offline-guard] qwen3-0.6b is cached �� forcing offline mode</span>
<span>2026-05-27 00:19:41,004 - backend.backends.qwen_llm_backend - INFO - Loading Qwen3 0.6B on cuda...</span></pre><svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" viewBox="0 0 12 12" fill="none" class="_9bc997d _33882ae"></svg><svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" viewBox="0 0 12 12" fill="none" class="_9bc997d _28d7e84"></svg></div><p class="ds-markdown-paragraph"><span class="">Full log excerpt showing download behavior:</span></p><div class="md-code-block md-code-block-light"><div class="md-code-block-banner-wrap"><div class="md-code-block-banner md-code-block-banner-lite"><div class="_121d384"><div class="d2a24f03"><span class="d813de27">text</span></div></div></div></div><pre><span>model.safetensors:   0%|          | 0.00/1.50G [00:00&lt;?, ?B/s]</span>
<span>model.safetensors: 100%|██████████| 1.50G/1.50G [00:09&lt;00:00, 1.28MB/s]</span></pre><svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" viewBox="0 0 12 12" fill="none" class="_9bc997d _33882ae"></svg><svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" viewBox="0 0 12 12" fill="none" class="_9bc997d _28d7e84"></svg></div><p class="ds-markdown-paragraph"><strong><span class="">Key observation:</span></strong><span class="">
 Even after deleting the 0.6B model cache, Voicebox automatically 
re-downloads the entire 1.5GB 0.6B model file on first use of "Speak in 
character," confirming that 0.6B is hardcoded as the refinement engine.</span></p><h3><span class="">Additional Context</span></h3><ul><li><p class="ds-markdown-paragraph"><span class="">The TTS model selection works correctly (I can switch between Qwen3-TTS 0.6B/1.7B/4B without issue)</span></p></li><li><p class="ds-markdown-paragraph"><span class="">The refinement model management UI shows 1.7B as "downloaded" but the backend ignores it</span></p></li><li><p class="ds-markdown-paragraph"><span class="">This
 appears similar to a known bug from earlier versions (v0.1.13) where 
the opposite behavior occurred (0.6B selection loaded 1.7B)</span></p></li></ul><h3><span class="">Expected Fix</span></h3><p class="ds-markdown-paragraph"><span class="">When a user selects a refinement model (0.6B, 1.7B, or 4B) in the settings, the </span><code>qwen_llm_backend</code><span class=""> should respect that choice and load the selected model instead of hardcoding to 0.6B.</span></p>
</body>
</html>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] v0.5.0-Refinement model selection ignores user choice, always loads Qwen3 0.6B instead of selected 1.7B/4B #711

📝 Issue标题

📋 Issue正文

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Logs

Additional Context

Expected Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] v0.5.0-Refinement model selection ignores user choice, always loads Qwen3 0.6B instead of selected 1.7B/4B #711

Description

📝 Issue标题

📋 Issue正文

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Logs

Additional Context

Expected Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions