Description
Your current environment
vllm main
🐛 Describe the bug
This is a issue relate to #18843
Issue
vllm ascend has a quantization extension to extend quantization choice:
https://github.com/vllm-project/vllm-ascend/blob/92bc5576d8899cff4e041e20af11a4f1d46aa066/vllm_ascend/platform.py#L76-L80
But after 3c49dbd , the quantization choices return None.
The problem are found in: vllm-project/vllm-ascend#1042
File "/__w/vllm-ascend/vllm-ascend/vllm_ascend/platform.py", line 79, in pre_register_and_update
if ASCEND_QUATIZATION_METHOD not in quant_action.choices:
TypeError: argument of type 'NoneType' is not iterable
Invisitigation:
Line 449 in 432ec99
-
Expected (Without
SkipValidation
):type_hints {<class 'NoneType'>, typing.Literal['aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'ptpc_fp8', 'fbgemm_fp8', 'modelopt', 'modelopt_fp4', 'marlin', 'bitblas', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'gptq_bitblas', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex', 'quark', 'moe_wna16', 'torchao', 'auto-round']}
Lines 205 to 206 in 432ec99
and finally set choice succesfully:
https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L136-L145 -
Unexpected (With
SkipValidation
):
But after add SkipValidation:
Lines 239 to 240 in 432ec99
type_hints {typing.Optional[typing.Literal['aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'ptpc_fp8', 'fbgemm_fp8', 'modelopt', 'modelopt_fp4', 'marlin', 'bitblas', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'gptq_bitblas', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex', 'quark', 'moe_wna16', 'torchao', 'auto-round']]}
So, the choice of quantization is not set as expected
Workaround:
I try to fix it in vllm, it works
diff --git a/vllm/engine/arg_utils.py b/vllm/engine/arg_utils.py
index 555532526..354452ff7 100644
--- a/vllm/engine/arg_utils.py
+++ b/vllm/engine/arg_utils.py
@@ -476,6 +476,7 @@ class EngineArgs:
model_group.add_argument("--max-model-len",
**model_kwargs["max_model_len"])
model_group.add_argument("--quantization", "-q",
+ choices=list(get_args(QuantizationMethods)),
**model_kwargs["quantization"])
model_group.add_argument("--enforce-eager",
**model_kwargs["enforce_eager"])
But I think it should have a better way to setup choice by using typing
to resolve it, I am not very familar with it.
Could you give some suggestion? Maybe cc @hmellor
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.