Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTQ models load much slower than 0cc4m's fork used to #444

Open
fkiifdjo opened this issue Aug 24, 2023 · 1 comment
Open

GPTQ models load much slower than 0cc4m's fork used to #444

fkiifdjo opened this issue Aug 24, 2023 · 1 comment

Comments

@fkiifdjo
Copy link

The slow loading happens every time a new chat message is being generated, it's particularly noticeable with 30b models but also noticeable with 13b.

@henk717
Copy link
Owner

henk717 commented Aug 24, 2023

There are currently two issues going on:

  1. Occam's GPTQ version is not properly compatible with newer huggingface builds so you are falling back to AutoGPTQ in more cases.
  2. AutoGPTQ itself has an outstanding issue where this happens on some systems which they still need to address.

So on the Kobold side we are waiting for either GPTQ package to get the needed updates before the full speed is restored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants