-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add int4 and int8 weight-only quantisation #95
Conversation
If this fits in 8.75GB does that mean a 3060 should be able to run this? it didn't fit last time I tried. would the int4 version have a better chance ? (amazing work btw) |
I think so... worth trying again! |
when can we expect to have documentation/instructions on how to run it in a
beginner-friendly way? I'll try as soon as I have that, for sure.
…On Thu, Mar 14, 2024 at 7:11 PM Vatsal Aggarwal ***@***.***> wrote:
If this fits in 8.75GB does that mean a 3060 should be able to run this?
it didn't fit last time I tried. would the int4 version have a better
chance ? (amazing work btw)
I think so... worth trying again!
—
Reply to this email directly, view it on GitHub
<#95 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA2SFJHJANP5MPJQCOYXADYYHR4JAVCNFSM6AAAAABEWSIEJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGA2DSNRUGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
勇気とユーモア
|
@arthurwolf have you tried the instructions in the README? What were the problems you ran into and how can we make it more friendly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
resolves #83
int4
tok/s is almost 2x fp16/bf16...A10G
bf16: 125tok/s, 8170GB
int8: 65tok/s, 7208GB
Int4: 225tok/s, 6660GB
4090
bf16: 229 tok/s, 8.75GB mem
Int8: 193tok/s, 7.68GB
int4: 431tok/s, 7.51GB