-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failure to launch codegeex4-all-9b Using vllm #11910
Comments
Try adding torch_dtype="float16". For instance: llm = LLM(
model=model_name,
tensor_parallel_size=tp_size,
max_model_len=max_model_len,
trust_remote_code=True,
enforce_eager=True,
torch_dtype="float16", # adding this
# If OOM,try using follong parameters
# enable_chunked_prefill=True,
# max_num_batched_tokens=8192
) |
Unable to recognize torch_dtype
|
Sry, it is dtype="float16" |
Hi @YongZhuIntel , I successfully run codegex4-all-9b with vllm on a single card or two cards of A770. It is noted that for a single card, |
@Uxito-Ada I run codegex4-all-9b with vllm on a single card for int4 format
but got an OOM error when run "python vllm_online_benchmark.py codegeex4-all-9b 2"
|
Hi @YongZhuIntel , With the script your provide, I can successfully start vllm server and then execute the inference request in vLLM-Serving's README. What version of ipex-llm are used in your environment? And please also provide |
@Uxito-Ada I run vllm on the docker image: intelanalytics/ipex-llm-serving-vllm-xpu-experiment:latest The vllm_online_benchmark.py: |
INFO 08-27 09:33:39 gpu_executor.py:100] # GPU blocks: 12587, # CPU blocks: 6553 |
Hi @YongZhuIntel , GPU memory consumption can be decreased by tuning server parameters, e.g. after lowing |
We are trying to launch codegeex4-all-9b Using vllm following the CodeGeeX4 github:
https://github.com/THUDM/CodeGeeX4?tab=readme-ov-file#vllm
The scripts are as following:
codegeex_offline_example.py:
codegeex_offline_example.sh
when running codegeex_offline_example.sh on docker we got the an error:
error log:
codegeex_offline_example_error.log
The text was updated successfully, but these errors were encountered: