feat: add int4 and int8 weight-only quantisation #95

vatsalaggarwal · 2024-03-14T17:47:21Z

resolves #83

int4 tok/s is almost 2x fp16/bf16...
INT8 is slower than bf/16fp16 for some reason, so have added a warning...
impact on memory usage is minimal.

A10G

bf16: 125tok/s, 8170GB
int8: 65tok/s, 7208GB
Int4: 225tok/s, 6660GB

4090

bf16: 229 tok/s, 8.75GB mem
Int8: 193tok/s, 7.68GB
int4: 431tok/s, 7.51GB

* main: feat: naive finetuning (#93) Adding dependency versioning via poetry (#92)

arthurwolf · 2024-03-14T18:08:06Z

If this fits in 8.75GB does that mean a 3060 should be able to run this? it didn't fit last time I tried. would the int4 version have a better chance ? (amazing work btw)

vatsalaggarwal · 2024-03-14T18:10:55Z

If this fits in 8.75GB does that mean a 3060 should be able to run this? it didn't fit last time I tried. would the int4 version have a better chance ? (amazing work btw)

I think so... worth trying again!

arthurwolf · 2024-03-14T18:12:15Z

when can we expect to have documentation/instructions on how to run it in a beginner-friendly way? I'll try as soon as I have that, for sure.

…

On Thu, Mar 14, 2024 at 7:11 PM Vatsal Aggarwal ***@***.***> wrote: If this fits in 8.75GB does that mean a 3060 should be able to run this? it didn't fit last time I tried. would the int4 version have a better chance ? (amazing work btw) I think so... worth trying again! — Reply to this email directly, view it on GitHub <#95 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA2SFJHJANP5MPJQCOYXADYYHR4JAVCNFSM6AAAAABEWSIEJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGA2DSNRUGQ> . You are receiving this because you commented.Message ID: ***@***.***>

-- 勇気とユーモア

fam/llm/fast_inference_utils.py

fam/llm/fast_quantize.py

vatsalaggarwal · 2024-03-15T10:00:38Z

@arthurwolf have you tried the instructions in the README? What were the problems you ran into and how can we make it more friendly?

fam/llm/fast_quantize.py

lucapericlp

LGTM

feat: add int4 and int8 quantisation

e6b302a

vatsalaggarwal requested a review from sidroopdaska March 14, 2024 17:47

vatsalaggarwal marked this pull request as draft March 14, 2024 17:47

vatsalaggarwal self-assigned this Mar 14, 2024

vatsalaggarwal changed the title ~~feat: add int4 and int8 quantisation~~ feat: add int4 and int8 weight-only quantisation Mar 14, 2024

vatsalaggarwal added 4 commits March 14, 2024 18:00

formatting fixes

4df852c

update README

ee0cdf8

update

78e2c57

Merge branch 'main' into vatsal/quantization

508cb0d

* main: feat: naive finetuning (#93) Adding dependency versioning via poetry (#92)

vatsalaggarwal marked this pull request as ready for review March 14, 2024 18:04

vatsalaggarwal requested a review from lucapericlp March 14, 2024 18:04

vatsalaggarwal mentioned this pull request Mar 14, 2024

Quantization Support? #83

Closed

lucapericlp reviewed Mar 14, 2024

View reviewed changes

fam/llm/fast_inference_utils.py Show resolved Hide resolved

lucapericlp reviewed Mar 15, 2024

View reviewed changes

fam/llm/fast_quantize.py Outdated Show resolved Hide resolved

lucapericlp reviewed Mar 15, 2024

View reviewed changes

fam/llm/fast_quantize.py Show resolved Hide resolved

fix pr comments

1eb79fa

lucapericlp reviewed Mar 15, 2024

View reviewed changes

fam/llm/fast_quantize.py Show resolved Hide resolved

add comment

99168af

lucapericlp previously approved these changes Mar 15, 2024

View reviewed changes

add comment

7d3581d

vatsalaggarwal dismissed lucapericlp’s stale review via 7d3581d March 15, 2024 11:14

vatsalaggarwal merged commit af516be into main Mar 15, 2024

vatsalaggarwal deleted the vatsal/quantization branch March 15, 2024 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add int4 and int8 weight-only quantisation #95

feat: add int4 and int8 weight-only quantisation #95

vatsalaggarwal commented Mar 14, 2024 •

edited

Loading

arthurwolf commented Mar 14, 2024 •

edited

Loading

vatsalaggarwal commented Mar 14, 2024

arthurwolf commented Mar 14, 2024 via email

vatsalaggarwal commented Mar 15, 2024

lucapericlp left a comment

feat: add int4 and int8 weight-only quantisation #95

feat: add int4 and int8 weight-only quantisation #95

Conversation

vatsalaggarwal commented Mar 14, 2024 • edited Loading

A10G

4090

arthurwolf commented Mar 14, 2024 • edited Loading

vatsalaggarwal commented Mar 14, 2024

arthurwolf commented Mar 14, 2024 via email

vatsalaggarwal commented Mar 15, 2024

lucapericlp left a comment

Choose a reason for hiding this comment

vatsalaggarwal commented Mar 14, 2024 •

edited

Loading

arthurwolf commented Mar 14, 2024 •

edited

Loading