Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): Add exllama GPTQ CUDA kernel support #553 #666

Merged
merged 30 commits into from
Jul 21, 2023

Conversation

Narsil
Copy link
Collaborator

@Narsil Narsil commented Jul 20, 2023

Just trying to get the integration tests to pass.

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil Narsil changed the title Superseeds #553 feat(server): Add exllama GPTQ CUDA kernel support #553 Jul 21, 2023
@Narsil Narsil merged commit d5b5bc7 into main Jul 21, 2023
5 checks passed
@Narsil Narsil deleted the gptq-cuda-kernels2 branch July 21, 2023 08:59
@Narsil Narsil mentioned this pull request Jul 21, 2023
2 tasks
@Atry
Copy link
Contributor

Atry commented Jul 27, 2023

Shall we also update Makefile?

@aoyifei
Copy link

aoyifei commented Jul 27, 2023

I just clone the latest commit, and then install from local, how could i use these exllama for gptq? or I just launch the server with --quantize gptq, then it wiil automatically using this feature?

@Atry
Copy link
Contributor

Atry commented Jul 27, 2023

You need to use the docker image to use exllama. TGI has not yet included any build script to install exllama for development purpose.

@OlivierDehaene
Copy link
Member

cd server/exllama_kernels
TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" python setup.py install

@BEpresent
Copy link

BEpresent commented Jul 28, 2023

You need to use the docker image to use exllama. TGI have not yet included any build script to install exllama for development purpose.

Thanks, just to confirm, when I use the docker image for inference then exllama will automatically be used if it is possible with the model and with just the --quantize gptq flag ?

@SunMarc SunMarc mentioned this pull request Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants