-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use AutoGPTQ model in tgi #658
Comments
What type of model is it? |
first,install https://github.com/PanQiWei/AutoGPTQ |
I have a question I believe you know, is it possible to load a model that contains only safetensors instead of .bin? |
Yes of course that's the preffered way of loading a TGI model. |
model = AutoGPTQForCausalLM.from_quantized(repo_id, device_map="auto", use_safetensors=True).half() |
AutoGPTQForCausalLM instead of AutoModelForCausalLM huggingface#658 (comment)
whenever I try to run some gptq model like this it returns the following error: I also made some changes to the GPTQ_GROUPSIZE and GPTQ_BITS environment variables based on model quant settings as described in the pr: #580 |
@TheBloke shared that some models where converted before or with some older versions and they don't have a However, if I understood correctly, most should. Which model are you using ? |
|
Yes it's a bit old. There are workaround if you absolutely need this ( Any readers stumbling here: Please comment or react, we might definitely revisit if there's a lot of demand for those old models. |
llama13b |
command:
export GPTQ_BITS=4
export GPTQ_GROUPSIZE=128
text-generation-launcher --model-id Ziya-LLaMA-13B_4bit --disable-custom-kernels --port 6006 --revision gptq-4bit-128g-actorder_True --quantize gptq
result:
Traceback (most recent call last):
File "/root/miniconda3/envs/text-generation-inference/bin/text-generation-server", line 8, in
sys.exit(app())
File "/root/autodl-tmp/text-generation-inference-main/server/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/root/autodl-tmp/text-generation-inference-main/server/text_generation_server/server.py", line 169, in serve
asyncio.run(
File "/root/miniconda3/envs/text-generation-inference/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/root/miniconda3/envs/text-generation-inference/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/root/autodl-tmp/text-generation-inference-main/server/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/root/autodl-tmp/text-generation-inference-main/server/text_generation_server/models/init.py", line 195, in get_model
return CausalLM(
File "/root/autodl-tmp/text-generation-inference-main/server/text_generation_server/models/causal_lm.py", line 477, in init
model = AutoModelForCausalLM.from_pretrained(
File "/root/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 467, in from_pretrained
return model_class.from_pretrained(
File "/root/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2387, in from_pretrained
raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory Ziya-LLaMA-13B_4bit.
rank=0
2023-07-20T08:34:02.453608Z ERROR text_generation_launcher: Shard 0 failed to start
2023-07-20T08:34:02.453654Z INFO text_generation_launcher: Shutting down shards
The text was updated successfully, but these errors were encountered: