Open
Description
If I quantize this gguf with this imatrix using this command:
quantize.exe --allow-requantize --imatrix mixtral-8x7b-instruct-v0.1.imatrix mixtral-8x7b-instruct-v0.1.Q8_0.gguf mixtral-8x7b-instruct-v0.1.IQ3_XXS.gguf IQ3_XXS
and I calculate perplexity with this command:
perplexity.exe -f wiki.test.raw --chunks 1000 --seed 42 --threads 8 --log-disable --no-mmap --mlock --ctx-size 512 --n-gpu-layers 999 --model mixtral-8x7b-instruct-v0.1.IQ3_XXS.gguf
I get three much different PPL values on three different versions of quantize.exe, everything else being equal:
b2037 31-1-2024 : 4.7009 +/- 0.02569
b???? 25-2-2024 : 4.7249 +/- 0.02576
b2329 03-3-2024 : 4.8491 +/- 0.02636
I suspect that there have been multiple cumulative regression events on the IQ3_XXS quantization implementation between b2037 and b2329.
cu12.2.0 on Windows 10.