Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add int4 and int8 weight-only quantisation #95

Merged
merged 8 commits into from
Mar 15, 2024

Conversation

vatsalaggarwal
Copy link
Member

@vatsalaggarwal vatsalaggarwal commented Mar 14, 2024

resolves #83

  • int4 tok/s is almost 2x fp16/bf16...
  • INT8 is slower than bf/16fp16 for some reason, so have added a warning...
  • impact on memory usage is minimal.

A10G

bf16: 125tok/s, 8170GB
int8: 65tok/s, 7208GB
Int4: 225tok/s, 6660GB

4090

bf16: 229 tok/s, 8.75GB mem
Int8: 193tok/s, 7.68GB
int4: 431tok/s, 7.51GB

@vatsalaggarwal vatsalaggarwal marked this pull request as draft March 14, 2024 17:47
@vatsalaggarwal vatsalaggarwal self-assigned this Mar 14, 2024
@vatsalaggarwal vatsalaggarwal changed the title feat: add int4 and int8 quantisation feat: add int4 and int8 weight-only quantisation Mar 14, 2024
* main:
  feat: naive finetuning (#93)
  Adding dependency versioning via poetry (#92)
@vatsalaggarwal vatsalaggarwal marked this pull request as ready for review March 14, 2024 18:04
@arthurwolf
Copy link

arthurwolf commented Mar 14, 2024

If this fits in 8.75GB does that mean a 3060 should be able to run this? it didn't fit last time I tried. would the int4 version have a better chance ? (amazing work btw)

@vatsalaggarwal
Copy link
Member Author

If this fits in 8.75GB does that mean a 3060 should be able to run this? it didn't fit last time I tried. would the int4 version have a better chance ? (amazing work btw)

I think so... worth trying again!

@arthurwolf
Copy link

arthurwolf commented Mar 14, 2024 via email

@vatsalaggarwal
Copy link
Member Author

@arthurwolf have you tried the instructions in the README? What were the problems you ran into and how can we make it more friendly?

lucapericlp
lucapericlp previously approved these changes Mar 15, 2024
Copy link
Contributor

@lucapericlp lucapericlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quantization Support?
3 participants