Feature Request: Quantified model support #10965

wyhfc123 · 2024-12-24T08:05:23Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

When I convert the bnb model to gguf, I report ValueError: Can not map tensor 'Model.layers.0.mlp.down_proj.weight.absmax'. Looking at the documentation, the current version of llama.cpp does not support the conversion of the quantized model to gguf. In production, there are many requirements to convert the quantified models (AWQ,BNB,GPTQ) to gguf for ollama deployment, so hopefully the authors can add this capability

Motivation

Conversion of quantized models to gguf is not currently supported. After the function is implemented, users can convert any quantized model to gguf and deploy it through ollama

Possible Implementation

No response

Mushoz · 2024-12-24T12:52:48Z

I am not sure I am getting the use-case for this? Why would you want to convert an already quantized model to GGUF? Llama.cpp doesn't support AWQ, BNB and GPTQ, so to create a GUFF you would have to requantize to a supported quantization method, losing more information on top what was already lost the first time the model was quantized to AWQ, BNB or GPTQ.

In what scenarios is the FP16 model not available that you can simply use that model to create a GGUF version?

wyhfc123 added the enhancement New feature or request label Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Quantified model support #10965

Feature Request: Quantified model support #10965

wyhfc123 commented Dec 24, 2024

Mushoz commented Dec 24, 2024

Feature Request: Quantified model support #10965

Feature Request: Quantified model support #10965

Comments

wyhfc123 commented Dec 24, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation

Mushoz commented Dec 24, 2024