Skip to content

Regressions on IQ3_XXS over time #5856

Open
@GlasslessPizza

Description

If I quantize this gguf with this imatrix using this command:

quantize.exe --allow-requantize --imatrix mixtral-8x7b-instruct-v0.1.imatrix mixtral-8x7b-instruct-v0.1.Q8_0.gguf mixtral-8x7b-instruct-v0.1.IQ3_XXS.gguf IQ3_XXS

and I calculate perplexity with this command:

perplexity.exe -f wiki.test.raw --chunks 1000 --seed 42 --threads 8 --log-disable --no-mmap --mlock --ctx-size 512 --n-gpu-layers 999 --model mixtral-8x7b-instruct-v0.1.IQ3_XXS.gguf

I get three much different PPL values on three different versions of quantize.exe, everything else being equal:

b2037 31-1-2024 : 4.7009 +/- 0.02569
b???? 25-2-2024 : 4.7249 +/- 0.02576
b2329 03-3-2024 : 4.8491 +/- 0.02636

I suspect that there have been multiple cumulative regression events on the IQ3_XXS quantization implementation between b2037 and b2329.

cu12.2.0 on Windows 10.

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions