-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: The new version (v0.5.4) cannot load the gptq model, but the old version (vllm=0.5.3.post1) can do it. #7240
Comments
+1 |
Marlin |
+1 |
@LucasWilkinson will take a look |
Explicitly setting We will look into the issue |
+1,请问大家有什么好的解决方法吗 |
可以 pip install vllm==0.5.3.post1回到老版本,或者上面有人回复的设置 quantization="gptq" 如果你是用T4的话 |
Closing because this is fixed by #7264 |
vllm [v0.5.4], shuyuej/Mistral-Nemo-Instruct-2407-GPTQ-INT8
add "--quantization gptq" and then OK. |
您好,这是什么意思呢 |
Hello, still the same error on a T4 with 'neuralmagic/Mistral-Nemo-Instruct-2407-quantized.w4a16' |
Your current environment
🐛 Describe the bug
The text was updated successfully, but these errors were encountered: