GPTQ support on ROCm #1489

fxmarty · 2024-01-25T18:09:56Z

Tested with

CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq
EXLLAMA_VERSION=1 CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq
CUDA_VISIBLE_DEVICES="0,1" text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq

all with good and identical results on MI210.

fxmarty · 2024-01-25T18:10:54Z

Dockerfile_amd

@@ -75,8 +75,8 @@ RUN chmod +x ~/mambaforge.sh && \
    mamba init && \
    rm ~/mambaforge.sh

-# Install PyTorch nightly (2.2.0.dev2023) compiled against RoCm 5.7, as VLLM can not be compiled with RoCm 5.6.
-RUN pip install --pre torch==2.2.0.dev20231106 --index-url https://download.pytorch.org/whl/nightly/rocm5.7


This nightly version was removed from https://download.pytorch.org/whl/nightly/rocm5.7

fxmarty · 2024-01-25T18:11:35Z

server/text_generation_server/utils/gptq/exllamav2.py

+        if torch.equal(g_idx, torch.tensor([i // groupsize for i in range(self.infeatures)], dtype=torch.int32, device=g_idx.device)):
+            self.g_idx = None
+        else:
+            self.g_idx = g_idx


Fixes a bug in the original exllamav2 code.

fxmarty · 2024-01-25T18:11:44Z

server/text_generation_server/utils/gptq/exllamav2.py

+        # We NEED to keep a pointer on Python side, otherwise the garbage collector will mess with us,
+        # and `Memory access fault by GPU node-2` will EAT you.
+        self.temp_dq = temp_dq


This gave me headaches today.

THIS is what was crashing TP>1?

@Narsil can we test this quickly? If that's the case that would be amazing.

OlivierDehaene

Thanks
We should really setup a CI for RocM now

Dockerfile_amd

Co-authored-by: OlivierDehaene <olivier@huggingface.co>

Tested with ``` CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq EXLLAMA_VERSION=1 CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq CUDA_VISIBLE_DEVICES="0,1" text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq ``` all with good and identical results on MI210. --------- Co-authored-by: Felix Marty <felix@hf.co> Co-authored-by: OlivierDehaene <olivier@huggingface.co> Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>

fxmarty commented Jan 25, 2024

View reviewed changes

fxmarty requested review from Narsil and OlivierDehaene January 25, 2024 18:15

fxmarty marked this pull request as ready for review January 25, 2024 18:15

OlivierDehaene reviewed Jan 26, 2024

View reviewed changes

Dockerfile_amd Outdated Show resolved Hide resolved

Felix Marty and others added 11 commits January 26, 2024 16:07

wip

33111a0

update torch

d8f33e3

fix

3c93b31

more logs

145c2d6

fix

da00279

cleanup

2909047

cleaning bis

359dd46

clean

059a2d9

update doc

fcee703

Update Dockerfile_amd

051c9c4

Co-authored-by: OlivierDehaene <olivier@huggingface.co>

update doc

7d2bc40

OlivierDehaene force-pushed the gptq-rocm branch from 088f8e7 to 7d2bc40 Compare January 26, 2024 15:11

update doc

4ee87f4

OlivierDehaene merged commit 650fea1 into main Jan 26, 2024
3 of 4 checks passed

OlivierDehaene deleted the gptq-rocm branch January 26, 2024 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ support on ROCm #1489

GPTQ support on ROCm #1489

fxmarty commented Jan 25, 2024 •

edited

Loading

fxmarty Jan 25, 2024

fxmarty Jan 25, 2024

Narsil Jan 26, 2024

fxmarty Jan 25, 2024

Narsil Jan 26, 2024

OlivierDehaene Jan 26, 2024

Narsil Jan 26, 2024

OlivierDehaene left a comment

GPTQ support on ROCm #1489

GPTQ support on ROCm #1489

Conversation

fxmarty commented Jan 25, 2024 • edited Loading

fxmarty Jan 25, 2024

Choose a reason for hiding this comment

fxmarty Jan 25, 2024

Choose a reason for hiding this comment

Narsil Jan 26, 2024

Choose a reason for hiding this comment

fxmarty Jan 25, 2024

Choose a reason for hiding this comment

Narsil Jan 26, 2024

Choose a reason for hiding this comment

OlivierDehaene Jan 26, 2024

Choose a reason for hiding this comment

Narsil Jan 26, 2024

Choose a reason for hiding this comment

OlivierDehaene left a comment

Choose a reason for hiding this comment

fxmarty commented Jan 25, 2024 •

edited

Loading