Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Quantified model support #10965

Open
4 tasks done
wyhfc123 opened this issue Dec 24, 2024 · 1 comment
Open
4 tasks done

Feature Request: Quantified model support #10965

wyhfc123 opened this issue Dec 24, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@wyhfc123
Copy link

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

When I convert the bnb model to gguf, I report ValueError: Can not map tensor 'Model.layers.0.mlp.down_proj.weight.absmax'. Looking at the documentation, the current version of llama.cpp does not support the conversion of the quantized model to gguf. In production, there are many requirements to convert the quantified models (AWQ,BNB,GPTQ) to gguf for ollama deployment, so hopefully the authors can add this capability

Motivation

Conversion of quantized models to gguf is not currently supported. After the function is implemented, users can convert any quantized model to gguf and deploy it through ollama

Possible Implementation

No response

@wyhfc123 wyhfc123 added the enhancement New feature or request label Dec 24, 2024
@Mushoz
Copy link

Mushoz commented Dec 24, 2024

I am not sure I am getting the use-case for this? Why would you want to convert an already quantized model to GGUF? Llama.cpp doesn't support AWQ, BNB and GPTQ, so to create a GUFF you would have to requantize to a supported quantization method, losing more information on top what was already lost the first time the model was quantized to AWQ, BNB or GPTQ.

In what scenarios is the FP16 model not available that you can simply use that model to create a GGUF version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants