-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): Add exllama GPTQ CUDA kernel support #553 #666
Conversation
…eneration-inference into gptq-cuda-kernels
Shall we also update |
I just clone the latest commit, and then install from local, how could i use these exllama for gptq? or I just launch the server with --quantize gptq, then it wiil automatically using this feature? |
You need to use the docker image to use exllama. TGI has not yet included any build script to install exllama for development purpose. |
|
Thanks, just to confirm, when I use the docker image for inference then exllama will automatically be used if it is possible with the model and with just the |
Just trying to get the integration tests to pass.
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.