Add quantized mixtral support #2673

WoosukKwon · 2024-01-30T21:46:02Z

This is a hacky way to add back the quantization support for Mixtral, which was broken by the optimization in #2542 . NOTE: This is a temporary hack and we need to fix this in the future.

vllm/model_executor/model_loader.py

AlpinDale · 2024-01-30T22:15:20Z

vllm/model_executor/models/mixtral_quant.py

@@ -0,0 +1,412 @@
+# coding=utf-8


Instead of inserting a new file just for this, would it be better to add the MLP class again in mixtral.py and falling back to the original code for instances where linear method is None?

Hi @AlpinDale, thank you for the comment. We are doing this to resolve release blocker. What you mentioned will be the right fix for the future.

Add quantized mixtral support

d9dde90

simon-mo reviewed Jan 30, 2024

View reviewed changes

vllm/model_executor/model_loader.py Show resolved Hide resolved

Check quant_config

d39519c

AlpinDale reviewed Jan 30, 2024

View reviewed changes

WoosukKwon requested review from simon-mo and zhuohan123 January 31, 2024 00:27

simon-mo approved these changes Jan 31, 2024

View reviewed changes

simon-mo merged commit 3dad944 into main Jan 31, 2024

WoosukKwon deleted the mixtral-quant branch January 31, 2024 00:42

NikolaBorisov pushed a commit to deepinfra/vllm that referenced this pull request Jan 31, 2024

Add quantized mixtral support (vllm-project#2673)

1507a4c

pcmoritz mentioned this pull request Feb 1, 2024

Output Garbage Text in Mixtral 8x7b Post Upgrade to 0.3.0 #2714

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add quantized mixtral support (vllm-project#2673)

89bc76f

alexm-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Feb 13, 2024

Add quantized mixtral support (vllm-project#2673)

ad3b74c

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

tristanleclercq mentioned this pull request Apr 25, 2025

[Feature]: Inflight BNB quantization for Mixtral models #17199

Closed

1 task

tristanleclercq mentioned this pull request May 23, 2025

[Bugfix] Fix transformers model impl ignored for mixtral quant #18602

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add quantized mixtral support #2673

Add quantized mixtral support #2673

Uh oh!

WoosukKwon commented Jan 30, 2024

Uh oh!

Uh oh!

AlpinDale Jan 30, 2024

Uh oh!

simon-mo Jan 30, 2024

Uh oh!

Uh oh!

Uh oh!

Add quantized mixtral support #2673

Add quantized mixtral support #2673

Uh oh!

Conversation

WoosukKwon commented Jan 30, 2024

Uh oh!

Uh oh!

AlpinDale Jan 30, 2024

Choose a reason for hiding this comment

Uh oh!

simon-mo Jan 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!