For the Qwen3-VL-MoE models (e.g., Qwen/Qwen3-VL-30B-A3B-Instruct), the fused MoE architecture (similar to LLaMA 4 and GPT-OSS) requires additional support for quantization.
Since MoE parameters constitute the majority of the model, this significantly affects the achievable compression ratio.
To ensure compatibility, expert quantization should follow the VLLM-supported tensor shape for proper model loading.
Reference quantized model and example:
Model: QuantTrio/Qwen3-VL-30B-A3B-Instruct-AWQ
Example : vllm-project/llm-compressor — qwen3-vl-30b-a3b-Instruct-example.py