SpQR compression method #240

JianbangZ · 2023-06-09T12:53:20Z

How feasible to implement spQR into ggml?
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

gardner · 2023-06-12T23:14:12Z

The paper: https://arxiv.org/pdf/2306.03078.pdf

The code: https://github.com/Vahe1994/SpQR

Also start adding prompts in "./prompts"

PoignardAzur · 2023-12-31T12:25:35Z

Given this comment: ggerganov/llama.cpp#1602 (comment), it seems unlikely SpQR is going to be implemented any time soon:

The main idea of the SpQR paper is to separate "outliers". This has been tried as part of k-quants development and has been shown to be less effective, see for instance ggerganov/llama.cpp#1595 (comment) in ggerganov/llama.cpp#1595).

If we read the SpQR paper more carefully, we find that what they mean by "nearly lossless compression" is to arrive at a quantized perplexity within 1% of the full model. The Q4_K_M variant of k-quants does that for ggml, see for instance PR ggerganov/llama.cpp#1684

We can probably close this issue.

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023

Add "--instruct" argument for usage with Alpaca (ggerganov#240)

9e17072

Also start adding prompts in "./prompts"

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023

Add instruction for using Alpaca (ggerganov#240)

a4e63b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpQR compression method #240

SpQR compression method #240

JianbangZ commented Jun 9, 2023

gardner commented Jun 12, 2023

PoignardAzur commented Dec 31, 2023

SpQR compression method #240

SpQR compression method #240

Comments

JianbangZ commented Jun 9, 2023

gardner commented Jun 12, 2023

PoignardAzur commented Dec 31, 2023