New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

GPTQ models load much slower than 0cc4m's fork used to #444

Open

fkiifdjo opened this issue Aug 24, 2023 · 1 comment

fkiifdjo commented Aug 24, 2023

The slow loading happens every time a new chat message is being generated, it's particularly noticeable with 30b models but also noticeable with 13b.

Owner

henk717 commented Aug 24, 2023

There are currently two issues going on:

Occam's GPTQ version is not properly compatible with newer huggingface builds so you are falling back to AutoGPTQ in more cases.
AutoGPTQ itself has an outstanding issue where this happens on some systems which they still need to address.

So on the Kobold side we are waiting for either GPTQ package to get the needed updates before the full speed is restored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment