feat(server): Add exllama GPTQ CUDA kernel support #553 #666

Narsil · 2023-07-20T19:10:03Z

Just trying to get the integration tests to pass.

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…eneration-inference into gptq-cuda-kernels

Atry · 2023-07-27T07:05:46Z

Shall we also update Makefile?

aoyifei · 2023-07-27T09:30:53Z

I just clone the latest commit, and then install from local, how could i use these exllama for gptq? or I just launch the server with --quantize gptq, then it wiil automatically using this feature?

Atry · 2023-07-27T15:00:46Z

You need to use the docker image to use exllama. TGI has not yet included any build script to install exllama for development purpose.

OlivierDehaene · 2023-07-27T15:52:39Z

cd server/exllama_kernels
TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" python setup.py install

BEpresent · 2023-07-28T10:06:38Z

You need to use the docker image to use exllama. TGI have not yet included any build script to install exllama for development purpose.

Thanks, just to confirm, when I use the docker image for inference then exllama will automatically be used if it is possible with the model and with just the --quantize gptq flag ?

fxmarty and others added 23 commits July 5, 2023 15:43

add exllama gptq kernel

ee7ba48

add attribution

c858d79

Merge branch 'main' into gptq-cuda-kernels

0ff8219

some more cleanup

2272b3a

Merge branch 'gptq-cuda-kernels' of https://github.com/fxmarty/text-g…

620ed7d

…eneration-inference into gptq-cuda-kernels

try-catch to load the cuda extension, quite ugly practice tbh

a6e3874

have a single gptq quantization type

4462854

move exllama buffer init to the top level

67a46b7

cleanup

67d6876

support bits different than 4

f90c61a

tests

8645fd3

Merge branch 'main' into gptq-cuda-kernels

faa5b52

fix test

38c2be5

fix tests

2ae65b4

support all, test llama

0036084

Merge branch 'main' into gptq-cuda-kernels

9401e10

fix the usual merge mess

74e6d6e

Merge branch 'main' into gptq-cuda-kernels

edfbfdf

fix per-column quantization

6bf7090

Refactored a bit.

0860394

Small polish.

8cf7c89

Give escape hatch to not use exllama kernels even if available.

7faef69

Fixing GTPQ device santacoder.

900ac49

Narsil requested a review from OlivierDehaene July 20, 2023 19:10

Narsil added 6 commits July 20, 2023 19:56

Fix config.

12191b7

Add kernel target.

c6e702f

Separate build process.

3ec3add

Update starcoder_gptq

40be532

Wtf gh.

1dc952a

Switching model for integration test llama gptq.

8b6a262

Getting closer to the non gptq test (stop sequence doesn't work).

afb3940

Narsil changed the title ~~Superseeds #553~~ feat(server): Add exllama GPTQ CUDA kernel support #553 Jul 21, 2023

OlivierDehaene approved these changes Jul 21, 2023

View reviewed changes

Narsil merged commit d5b5bc7 into main Jul 21, 2023
5 checks passed

Narsil deleted the gptq-cuda-kernels2 branch July 21, 2023 08:59

Narsil mentioned this pull request Jul 21, 2023

Add exllama GPTQ CUDA kernel support #553

Closed

2 tasks

SunMarc mentioned this pull request Oct 18, 2023

Upgrade to exllama v2 #1016

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): Add exllama GPTQ CUDA kernel support #553 #666

feat(server): Add exllama GPTQ CUDA kernel support #553 #666

Narsil commented Jul 20, 2023

Atry commented Jul 27, 2023

aoyifei commented Jul 27, 2023

Atry commented Jul 27, 2023 •

edited

Loading

OlivierDehaene commented Jul 27, 2023

BEpresent commented Jul 28, 2023 •

edited

Loading

feat(server): Add exllama GPTQ CUDA kernel support #553 #666

feat(server): Add exllama GPTQ CUDA kernel support #553 #666

Conversation

Narsil commented Jul 20, 2023

What does this PR do?

Before submitting

Who can review?

Atry commented Jul 27, 2023

aoyifei commented Jul 27, 2023

Atry commented Jul 27, 2023 • edited Loading

OlivierDehaene commented Jul 27, 2023

BEpresent commented Jul 28, 2023 • edited Loading

Atry commented Jul 27, 2023 •

edited

Loading

BEpresent commented Jul 28, 2023 •

edited

Loading