Skip to content

[Bug]: quantization cli choice not setup after SkipValidation #19004

Closed
@Yikun

Description

@Yikun

Your current environment

vllm main

🐛 Describe the bug

This is a issue relate to #18843

Issue

vllm ascend has a quantization extension to extend quantization choice:
https://github.com/vllm-project/vllm-ascend/blob/92bc5576d8899cff4e041e20af11a4f1d46aa066/vllm_ascend/platform.py#L76-L80

But after 3c49dbd , the quantization choices return None.

The problem are found in: vllm-project/vllm-ascend#1042

  File "/__w/vllm-ascend/vllm-ascend/vllm_ascend/platform.py", line 79, in pre_register_and_update
    if ASCEND_QUATIZATION_METHOD not in quant_action.choices:
TypeError: argument of type 'NoneType' is not iterable

Invisitigation:

model_kwargs = get_kwargs(ModelConfig)

  • Expected (Without SkipValidation):

    type_hints
    {<class 'NoneType'>, typing.Literal['aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'ptpc_fp8', 'fbgemm_fp8', 'modelopt', 'modelopt_fp4', 'marlin', 'bitblas', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'gptq_bitblas', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex', 'quark', 'moe_wna16', 'torchao', 'auto-round']}
    

    elif contains_type(type_hints, Literal):
    kwargs[name].update(literal_to_kwargs(type_hints))

    and finally set choice succesfully:
    https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L136-L145

  • Unexpected (With SkipValidation):
    But after add SkipValidation:

    or any(is_not_builtin(th) for th in type_hints)):
    kwargs[name]["type"] = str

    type_hints
    {typing.Optional[typing.Literal['aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'ptpc_fp8', 'fbgemm_fp8', 'modelopt', 'modelopt_fp4', 'marlin', 'bitblas', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'gptq_bitblas', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex', 'quark', 'moe_wna16', 'torchao', 'auto-round']]}
    

    So, the choice of quantization is not set as expected

Workaround:

I try to fix it in vllm, it works

diff --git a/vllm/engine/arg_utils.py b/vllm/engine/arg_utils.py
index 555532526..354452ff7 100644
--- a/vllm/engine/arg_utils.py
+++ b/vllm/engine/arg_utils.py
@@ -476,6 +476,7 @@ class EngineArgs:
         model_group.add_argument("--max-model-len",
                                  **model_kwargs["max_model_len"])
         model_group.add_argument("--quantization", "-q",
+                                 choices=list(get_args(QuantizationMethods)),
                                  **model_kwargs["quantization"])
         model_group.add_argument("--enforce-eager",
                                  **model_kwargs["enforce_eager"])

But I think it should have a better way to setup choice by using typing to resolve it, I am not very familar with it.

Could you give some suggestion? Maybe cc @hmellor

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions