-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tgi crash on multi GPUs #2207
Comments
Seeing a similar issue on my end. |
@RohanSohani30 Can you share the output of TGI when it errors? |
There are no errors but system is getting crashed while the Warming model. |
Yeah seems related to CUDA graphs and a bug introduced in NCCL 2.20.5. Can you retry with the |
I am using the latest docker image. Still facing the same issue. |
If disabling SHM solves the issue, it means that there is a problem on the way your system handles SHM. How much RAM do you have on the machine? @HoKim98 does the |
@Hugoch It seems to be working! Had a 10-min stress testing and no errors were found. |
1TB RAM with 8*16G VRAM. |
Llama3-8B has a context of 8k, so you probably want to reduce |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
System Info
I am trying to run TGI on Docker using 8 GPUs with 16GB each (In-house server) . Docker works fine with using single GPU.
My server crashes when using all GPUs. is there any other way to do it.
PS. I Need to use all GPUs so I can load big models. using single GPU I can use small models with less max-input-lenght
Information
Tasks
Reproduction
Expected behavior
INFO text_generation_router: router/src/main.rs:242: Using the Hugging Face API to retrieve tokenizer config
INFO text_generation_router: router/src/main.rs:291: Warming up model
WARN text_generation_router: router/src/main.rs:306: Model does not support automatic max batch total tokens
INFO text_generation_router: router/src/main.rs:328: Setting max batch total tokens to 16000
INFO text_generation_router: router/src/main.rs:329: Connected
The text was updated successfully, but these errors were encountered: