Remove Q4_3 which is no better than Q5 #1218

sw · 2023-04-28T17:49:05Z

I hope this isn't too controversial...

Q4_3 turns out to be equal or worse than the Q5 types in all criteria we have: perplexity, file size, token generation speed.

In the interest of reducing code base complexity, remove the Q4_3 type.

It has only been introduced last week I think, so I don't think many people use it. Of course I'm ready to be proven wrong on this...

Notes:

I haven't tested CUDA or OpenCL
GGML_TYPE_COUNT is now somewhat incorrect. I didn't want to change the enum values that are used in model files, but we might move GGML_TYPE_I8 to the now unused value 5.

ggml.h

llama.h

j-f1 · 2023-04-28T17:54:29Z

^ suggesting leaving a reminder so people aren’t confused about why numbers are skipped, and so an informed decision can be made in the future if the format wants to be reused.

ggerganov · 2023-04-28T18:41:06Z

I wasn't sure if the timings for x86 follow the same pattern as the one I reported for M1.
If yes, we can drop Q4_3 support. Overall, the plan will be to drop support for whatever is not needed, but I think we will need to do a few experiments with other models to fully understand which methods are obsolete.

Btw, here is an ongoing evaluation with RWKV: ggerganov/ggml#89 (comment)

sw · 2023-04-28T18:56:27Z

The case against Q4_3 is even stronger on x86; at least on my 4-core AVX2 machine, Q4_3 is slower than both Q5 formats.

Green-Sky · 2023-04-28T19:10:27Z

I think we should do this in 2 stages , IF it comes to this.

deprecate the format with visible messages AND remove the quantization option from the quantize program, so no new model files are put into circulation.
remove the rest of the code.

ggerganov · 2023-04-28T19:31:21Z

@Green-Sky no need to for this since the notice for no backward compatibility for Q4_2 and Q4_3 was up until 2 days ago: f9be42a

Green-Sky · 2023-04-28T19:51:56Z

Oh, very true.
edit: time feels very elusive these days...

j-f1 reviewed Apr 28, 2023

View reviewed changes

ggml.h Show resolved Hide resolved

llama.h Show resolved Hide resolved

sw added 2 commits April 28, 2023 21:04

Remove Q4_3 which is no better than Q5

924309a

Review suggestions: comments for removed enum values

f1ec8b4

sw force-pushed the remove-q4_3 branch from 166164d to f1ec8b4 Compare April 28, 2023 19:04

ggerganov approved these changes Apr 28, 2023

View reviewed changes

sw merged commit 36d19a6 into ggerganov:master Apr 28, 2023

sw deleted the remove-q4_3 branch April 28, 2023 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Q4_3 which is no better than Q5 #1218

Remove Q4_3 which is no better than Q5 #1218

sw commented Apr 28, 2023 •

edited

Loading

j-f1 commented Apr 28, 2023

ggerganov commented Apr 28, 2023

sw commented Apr 28, 2023

Green-Sky commented Apr 28, 2023

ggerganov commented Apr 28, 2023

Green-Sky commented Apr 28, 2023 •

edited

Loading

Remove Q4_3 which is no better than Q5 #1218

Remove Q4_3 which is no better than Q5 #1218

Conversation

sw commented Apr 28, 2023 • edited Loading

j-f1 commented Apr 28, 2023

ggerganov commented Apr 28, 2023

sw commented Apr 28, 2023

Green-Sky commented Apr 28, 2023

ggerganov commented Apr 28, 2023

Green-Sky commented Apr 28, 2023 • edited Loading

sw commented Apr 28, 2023 •

edited

Loading

Green-Sky commented Apr 28, 2023 •

edited

Loading