Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Q4_3 which is no better than Q5 #1218

Merged
merged 2 commits into from
Apr 28, 2023
Merged

Remove Q4_3 which is no better than Q5 #1218

merged 2 commits into from
Apr 28, 2023

Conversation

sw
Copy link
Contributor

@sw sw commented Apr 28, 2023

I hope this isn't too controversial...

Q4_3 turns out to be equal or worse than the Q5 types in all criteria we have: perplexity, file size, token generation speed.

In the interest of reducing code base complexity, remove the Q4_3 type.

It has only been introduced last week I think, so I don't think many people use it. Of course I'm ready to be proven wrong on this...

Notes:

  • I haven't tested CUDA or OpenCL
  • GGML_TYPE_COUNT is now somewhat incorrect. I didn't want to change the enum values that are used in model files, but we might move GGML_TYPE_I8 to the now unused value 5.

ggml.h Show resolved Hide resolved
llama.h Show resolved Hide resolved
@j-f1
Copy link
Collaborator

j-f1 commented Apr 28, 2023

^ suggesting leaving a reminder so people aren’t confused about why numbers are skipped, and so an informed decision can be made in the future if the format wants to be reused.

@ggerganov
Copy link
Owner

I wasn't sure if the timings for x86 follow the same pattern as the one I reported for M1.
If yes, we can drop Q4_3 support. Overall, the plan will be to drop support for whatever is not needed, but I think we will need to do a few experiments with other models to fully understand which methods are obsolete.

Btw, here is an ongoing evaluation with RWKV: ggerganov/ggml#89 (comment)

@sw
Copy link
Contributor Author

sw commented Apr 28, 2023

The case against Q4_3 is even stronger on x86; at least on my 4-core AVX2 machine, Q4_3 is slower than both Q5 formats.

@Green-Sky
Copy link
Collaborator

I think we should do this in 2 stages , IF it comes to this.

  1. deprecate the format with visible messages AND remove the quantization option from the quantize program, so no new model files are put into circulation.
  2. remove the rest of the code.

@ggerganov
Copy link
Owner

@Green-Sky no need to for this since the notice for no backward compatibility for Q4_2 and Q4_3 was up until 2 days ago: f9be42a

@Green-Sky
Copy link
Collaborator

Green-Sky commented Apr 28, 2023

Oh, very true.
edit: time feels very elusive these days...

@sw sw merged commit 36d19a6 into ggerganov:master Apr 28, 2023
@sw sw deleted the remove-q4_3 branch April 28, 2023 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants