-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Is your feature request related to a problem? Please describe.
Qwen35ChatHandler currently requires the clip_model_path argument. This makes it impossible to use the handler for text-only use cases without providing a vision encoder. Although Qwen3.5 is a multimodal model, many real-world scenarios are strictly text-only, and in those cases loading a CLIP encoder is unnecessary overhead. In practice, even without specifying any chat_handler, Qwen3.5 can produce reasonable text outputs, but it appears to force “thinking mode” by default.
Describe the solution you'd like
clip_model_path should not be a mandatory parameter for Qwen35ChatHandler. It should be optional, and the handler should support a text-only mode where no vision encoder is loaded.
Describe alternatives you've considered
It should be possible to control whether the “think” template is enabled on a per-request basis (e.g., at create_chat_completion time), rather than requiring a full model reload to switch modes. Ideally, this would be exposed via something like chat_template_kwargs={"enable_thinking": False} (similar to the official behavior).
Additional context
No.