-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Q4_3 which is no better than Q5 #1218
Conversation
^ suggesting leaving a reminder so people aren’t confused about why numbers are skipped, and so an informed decision can be made in the future if the format wants to be reused. |
I wasn't sure if the timings for x86 follow the same pattern as the one I reported for M1. Btw, here is an ongoing evaluation with RWKV: ggerganov/ggml#89 (comment) |
The case against Q4_3 is even stronger on x86; at least on my 4-core AVX2 machine, Q4_3 is slower than both Q5 formats. |
I think we should do this in 2 stages , IF it comes to this.
|
@Green-Sky no need to for this since the notice for no backward compatibility for |
Oh, very true. |
I hope this isn't too controversial...
Q4_3 turns out to be equal or worse than the Q5 types in all criteria we have: perplexity, file size, token generation speed.
In the interest of reducing code base complexity, remove the Q4_3 type.
It has only been introduced last week I think, so I don't think many people use it. Of course I'm ready to be proven wrong on this...
Notes:
GGML_TYPE_COUNT
is now somewhat incorrect. I didn't want to change the enum values that are used in model files, but we might moveGGML_TYPE_I8
to the now unused value 5.