Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clamp out of range values in K quantizer #6888

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

jart
Copy link
Contributor

@jart jart commented Apr 25, 2024

This assertion fails when quantizing Mixtral 8x7b as Q5_K_M, because I used convert.py --outtype f32 and the Mixtral weights use bf16 which has a much larger exponent range than the K quantizer is expecting. If --outtype f16 is used then the assert doesn't fail.

See #2982

This assertion fails when quantizing Mixtral 8x7b as Q5_K_M, because I
used `convert.py --outtype f32` and the Mixtral weights use bf16 which
has a much larger exponent range than the K quantizer is expecting. If
--outtype f16 is used then the assert doesn't fail.

See ggerganov#2982
@mofosyne mofosyne added bugfix fixes an issue or bug Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level model Model specific labels May 9, 2024
@mofosyne mofosyne marked this pull request as draft May 18, 2024 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug model Model specific Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants