[Bugfix] Fix gptq failure on T4s #7264

LucasWilkinson · 2024-08-07T14:52:55Z

Fix for #7240, bug was introduced by: a8d604c, min_capability was refactored incorrectly leading the override_quantization_method to think it was always on a Ampere system

tested: #7240 on an T4 GCP instance

github-actions · 2024-08-07T14:53:09Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

LucasWilkinson · 2024-08-07T18:27:14Z

/ready

Signed-off-by: Alvant <alvasian@yandex.ru>

fix min_capability being used incorrectly

4ea65d4

robertgshaw2-redhat approved these changes Aug 7, 2024

View reviewed changes

LucasWilkinson marked this pull request as ready for review August 7, 2024 18:27

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 7, 2024

robertgshaw2-redhat enabled auto-merge (squash) August 7, 2024 18:31

robertgshaw2-redhat merged commit 311f743 into vllm-project:main Aug 7, 2024
54 checks passed

robertgshaw2-redhat mentioned this pull request Aug 8, 2024

[Bug]: The new version (v0.5.4) cannot load the gptq model, but the old version (vllm=0.5.3.post1) can do it. #7240

Closed

ShangmingCai mentioned this pull request Aug 9, 2024

[Misc] Add quantization config support for speculative model. #7343

Merged

sfc-gh-mkeralapura pushed a commit to sfc-gh-mkeralapura/vllm that referenced this pull request Aug 12, 2024

[Bugfix] Fix gptq failure on T4s (vllm-project#7264)

c3e5a3a

robertgshaw2-redhat mentioned this pull request Aug 14, 2024

[Bug]: AutoAWQ marlin methods error #7517

Open

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

[Bugfix] Fix gptq failure on T4s (vllm-project#7264)

208ac67

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024

[Bugfix] Fix gptq failure on T4s (vllm-project#7264)

5af0dfb

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Bugfix] Fix gptq failure on T4s (vllm-project#7264)

a24b850

Signed-off-by: Alvant <alvasian@yandex.ru>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[Bugfix] Fix gptq failure on T4s (vllm-project#7264)

bbbe326

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix gptq failure on T4s #7264

[Bugfix] Fix gptq failure on T4s #7264

LucasWilkinson commented Aug 7, 2024 •

edited

Loading

github-actions bot commented Aug 7, 2024

LucasWilkinson commented Aug 7, 2024

[Bugfix] Fix gptq failure on T4s #7264

[Bugfix] Fix gptq failure on T4s #7264

Conversation

LucasWilkinson commented Aug 7, 2024 • edited Loading

github-actions bot commented Aug 7, 2024

LucasWilkinson commented Aug 7, 2024

LucasWilkinson commented Aug 7, 2024 •

edited

Loading