Skip to content

[Usage] Qwen3 Usage Guide #17327

Open
Open
@simon-mo

Description

@simon-mo

vLLM v0.8.4 and higher natively supports all Qwen3 and Qwen3MoE models. Example command:

File ".../vllm/model_executor/parameter.py", line 149, in load_qkv_weight
    param_data = param_data.narrow(self.output_dim, shard_offset,
IndexError: start out of range (expected to be in range of [-18, 18], but got 2048)
  • If you are seeing the following error when running MoE models with fp8, you are running with too much tensor parallelize degree that the weights are not divisible. Consider --tensor-parallel-size 4 or --tensor-parallel-size 8 --enable-expert-parallel.
File ".../vllm/vllm/model_executor/layers/quantization/fp8.py", line 477, in create_weights
    raise ValueError(
ValueError: The output_size of gate's and up's weight = 192 is not divisible by weight quantization block_n = 128.

Metadata

Metadata

Assignees

No one assigned

    Labels

    usageHow to use vllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions