-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore the representation of finer quantization granularities than per-axis #1569
Comments
Also AWQ that apparently is better than gptq https://arxiv.org/abs/2306.00978 |
@powderluv Thanks for providing the link! Indeed an interesting read. This is what I understood from the paper: There is a small fraction of salient weights that are much more important for LLM's performance compared to others. To find the salient weight channels, they refer to the activation distribution rather than the weight distribution: weight channels corresponding to larger activation magnitudes are more salient since they process more important features. The paper talks about relying on input activation magnitude to pick out the salient weight channels and their corresponding scales. For non-salient weight channels use the weight magnitude to get the corresponding scales. One thing that I was wondering about: From the POV of expressing the quantization parameters in StableHLO, this seems similar to the per-axis scaling scheme, which StableHLO currently supports. IMO, the novelty here lies in how the the scales are computed for each channel. Please let me know if am missing some point. |
Do we have any news on this topic? |
Hello @sunshinemyson |
Hi @sdasgup3 , Thanks for your reply. Actually we are working on the GPTQ too, according to our expr with Llama model, GPTQ can reach promise result. Looking forward updates. Thanks |
Hi @sdasgup3 , Would you mind sharing your progress on this topic? Recently, I found the AWQ/GPTQ very popular when I create my own LLM application locally. You can find more quantized model from https://huggingface.co/TheBloke. Thanks |
Hi @sunshinemyson
Thanks for sharing for use-cases! We are gathering the requirements from stakeholders to use them as a basis for the spec changes for multi-dimenstional per-axis and sub-channel support. I strongly hope to come up with a plan by early next month. Please stay tuned. |
Just wanted to update here that we still have plans to take this on in Q2'24. I will keep the updates posted. |
The issue is based on the discussion and interest shown by the community to explore the representation of finer quantization granularities for quantization parameters in StableHLO.
Please refer to the discussion link for additional references.
The text was updated successfully, but these errors were encountered: