-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change load format for Mixtral #2028
Conversation
it stills gives error #2024 when trying to load GPTQ version of Mixtral, And I'm unable to specify to load it with pt it says to load it with safetensors |
Yes, I'll do it later today/tomorrow. |
The problem there is that the model is only in Safetensors format, not PyTorch. You'll have to either wait for vLLM to support Mixtral in Safetensors or download the full-size model and quantize it yourself. |
Yes right, the issue seems to persist with the TheBloke/Mixtral-8x7B-v0.1-GPTQ model, @TheBloke any chance you can provide the pt weights for the same? |
I've not uploaded non-safetensors for GPTQ for 6+ months. I'm not sure if AutoGPTQ even supports it any more. I guess Transformers does. That's a major pain though and I'm certainly not going to upload PT for models on an ongoing basis. I'm willing to do it as a once-off for Mixtral 8x7B and the Instruct version, for testing purposes. I will upload them in separate branches of the main repos. Is this a problem unique to Mixtral GPTQ, or all vLLM GPTQs? I've not yet tested vLLM GPTQ support myself - planning to do so soon, then I'll add mention of it to my GPTQ READMEs. |
It's unique to Mixtral, as far as I know. I believe I've read somewhere that it's caused by the layer names being different between the PyTorch and Safetensors versions, but don't take my word for it. Someone did upload a GPTQ version of Mixtral, but only the base model: https://huggingface.co/IbuNai/Mixtral-8x7B-v0.1-gptq-4bit-pth |
Hmm, that sounds different then. If I make .pt versions of my GPTQs, the layer names will be identical. This issue sounds like vLLM only supports the non-HF version of Mixtral, ie the version distributed as Looks like that IbuNaj guy converted my Mixtral GPTQ to pth - maybe he renamed the layers at the same time then. Let me know if that works and if it does I'll see about doing an Instruct version. But hopefully vLLM will support HF version Mixtral as GPTQ soon? I know it will support the Mixtral AWQ versions soon, as Casper Hansen has made a PR for that - so that might be the easier option for you to use. |
Just checked, their model doesn't work, though seemingly for a different reason; while your version |
Checked the other variants as well but seems only 4bit models are supported right now, with 3bit and 8bit variants getting the following error:
|
Closes #2018
This is a bandage solution for Mixtral.