Closed
Description
I have the implementation ready, but I'm not sure if this is what we want. Use of an importance matrix does improve perplexity for all models I have tried. But on the other hand the "legacy" ggml
quants Q4_0
and Q5_0
are never very good, but they are also never really bad (Q4_1
and Q5_1
have more erratic behavior, for some models being better than Q4_0/Q5_0
and for other models being worse). Hence, one may want to preserve them the way they are as a kind of reference.
Opinions?