Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use AutoGPTQ model in tgi #658

Closed
Minami-su opened this issue Jul 20, 2023 · 10 comments
Closed

How to use AutoGPTQ model in tgi #658

Minami-su opened this issue Jul 20, 2023 · 10 comments

Comments

@Minami-su
Copy link

image

command:

export GPTQ_BITS=4
export GPTQ_GROUPSIZE=128

text-generation-launcher --model-id Ziya-LLaMA-13B_4bit --disable-custom-kernels --port 6006 --revision gptq-4bit-128g-actorder_True --quantize gptq

result:

Traceback (most recent call last):

File "/root/miniconda3/envs/text-generation-inference/bin/text-generation-server", line 8, in
sys.exit(app())

File "/root/autodl-tmp/text-generation-inference-main/server/text_generation_server/cli.py", line 78, in serve
server.serve(

File "/root/autodl-tmp/text-generation-inference-main/server/text_generation_server/server.py", line 169, in serve
asyncio.run(

File "/root/miniconda3/envs/text-generation-inference/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)

File "/root/miniconda3/envs/text-generation-inference/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()

File "/root/autodl-tmp/text-generation-inference-main/server/text_generation_server/server.py", line 136, in serve_inner
model = get_model(

File "/root/autodl-tmp/text-generation-inference-main/server/text_generation_server/models/init.py", line 195, in get_model
return CausalLM(

File "/root/autodl-tmp/text-generation-inference-main/server/text_generation_server/models/causal_lm.py", line 477, in init
model = AutoModelForCausalLM.from_pretrained(

File "/root/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 467, in from_pretrained
return model_class.from_pretrained(

File "/root/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2387, in from_pretrained
raise EnvironmentError(

OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory Ziya-LLaMA-13B_4bit.
rank=0
2023-07-20T08:34:02.453608Z ERROR text_generation_launcher: Shard 0 failed to start
2023-07-20T08:34:02.453654Z INFO text_generation_launcher: Shutting down shards

@OlivierDehaene
Copy link
Member

What type of model is it?

@Minami-su
Copy link
Author

first,install https://github.com/PanQiWei/AutoGPTQ
I changed the code, in text-generation-inference-main/server/text_generation_server/models/causal_lm.py
QQ截图20230720173217
QQ截图20230720173452
then,It worked
image

@Minami-su Minami-su changed the title How to use GPTQ model in tgi How to use AutoGPTQ model in tgi Jul 20, 2023
@tallesairan
Copy link

I have a question I believe you know, is it possible to load a model that contains only safetensors instead of .bin?

@OlivierDehaene
Copy link
Member

Yes of course that's the preffered way of loading a TGI model.

@Minami-su
Copy link
Author

I have a question I believe you know, is it possible to load a model that contains only safetensors instead of .bin?

model = AutoGPTQForCausalLM.from_quantized(repo_id, device_map="auto", use_safetensors=True).half()

tallesairan added a commit to tallesairan/text-generation-inference that referenced this issue Jul 31, 2023
AutoGPTQForCausalLM instead of AutoModelForCausalLM

huggingface#658 (comment)
@tallesairan
Copy link

I have a question I believe you know, is it possible to load a model that contains only safetensors instead of .bin?

model = AutoGPTQForCausalLM.from_quantized(repo_id, device_map="auto", use_safetensors=True).half()

whenever I try to run some gptq model like this it returns the following error:
RuntimeError: weight model.layers.0.self_attn.q_proj.g_idx does not exist

I also made some changes to the GPTQ_GROUPSIZE and GPTQ_BITS environment variables based on model quant settings as described in the pr: #580

@Narsil
Copy link
Collaborator

Narsil commented Jul 31, 2023

g_idx is necessary.

@TheBloke shared that some models where converted before or with some older versions and they don't have a g_idx which means they do not work with TGI.

However, if I understood correctly, most should.

Which model are you using ?

@tallesairan
Copy link

Thank you, I understand now
https://huggingface.co/TheBloke/chronos-hermes-13B-GPTQ
I'm trying to use the one that meets our needs, I also tried with TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ which also meets, but without success, I need to make some inferences simultaneously in gptq, as is possible in TGI

@Narsil
Copy link
Collaborator

Narsil commented Jul 31, 2023

Yes it's a bit old.

There are workaround if you absolutely need this (g_idx being absent means it's just increasing groups of size group_size).
However given that only old models seem to behave that way, and that adding the workaround in the code is likely to be relatively complex (because of sharding). We're currently not going to add it.

Any readers stumbling here: Please comment or react, we might definitely revisit if there's a lot of demand for those old models.

@Minami-su
Copy link
Author

What type of model is it?

llama13b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants