
Description
Hello. I did some perplexity tests while investigating issue with Llama-3. Initially the issue was that Llama-3 base 70b model outputs garbage with small quants with iMatrix. Can't find any information regarding that ROCm possibly causes corruption.
GPU test (RX 7600 + RX 7600 XT)
https://huggingface.co/mradermacher/Meta-Llama-3-70B-i1-GGUF/tree/main
Meta-Llama-3-70B.i1-Q2_K.gguf prints [1]-nan,[2]-nan,[3]-nan,[4]-nan with -ngl 30 or 0 (prints garbage unless -ngl 0)
https://huggingface.co/mradermacher/Meta-Llama-3-70B-GGUF/tree/main
Meta-Llama-3-70B.Q2_K.gguf - seems OK, [1]4.1839,[2]4.7300,[3]4.2751,[4]4.6444,[5]4.6942,[6]5.0426,[7]5.1405,[8]5.4747
Final estimate: PPL = 5.9315 +/- 0.03553
Pure CPU test
Meta-Llama-3-70B.i1-Q2_K.gguf with pure CPU 'perplexity' build (146 seconds per 512 tokens - ETA 26 hours 55.67 minutes)
[1]6.3962,[2]7.1886,[3]6.9886,[4]7.3853,[5]7.8924,[6]8.2982,[7]8.8956,[8]9.3799, (can't wait for many hours, stopped)
Meta-Llama-3-70B.Q2_K.gguf (static Q2_K):
[1]4.1675,[2]4.6952,[3]4.2374,[4]4.6452,[5]4.6677,[6]5.0459,[7]5.1258,[8]5.4649,^C
It's slightly better than on ROCm but the difference is very small.
I also found strange holes in the imatrix.dat that was used:
But the author seems uninterested in discussing that.