Open
Description
Your current environment
vllm 0.5.4
🐛 Describe the bug
autoawq marlin must with no zero point, but vllm:
def query_marlin_supported_quant_types(has_zp: bool,
min_capability: Optional[int] = None):
if min_capability is None:
major, minor = current_platform.get_device_capability()
min_capability = major * 10 + minor
if min_capability < 80:
return []
if has_zp:
# AWQ style, unsigned + runtime zero-point
return [scalar_types.uint4, scalar_types.uint8]
else:
# GPTQ style, unsigned + symmetric bias
# TODO: once fp8_marlin is merged into "gptq_marlin" we should be able
# to add `scalar_types.float8_e4m3fn` here
return [scalar_types.uint4b8, scalar_types.uint8b128]`
this would error### ###