-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: with_pynccl_for_all_reduce causes GPU OOM #4472
Comments
How do you install vllm? What is the command you run? As which user? The message you post in slack channel is:
However, normally this should be
Besides, please post full logging information to help identify the problem. |
Sorry, i have two environments, and the log is a bit messed up, now updated with a fixed version and full log+error message is uploaded through a gist. We installed vllm through docker, but before that we installed aws's own nccl plugin, i guess this might cause issue?
|
It is a known issue for nccl >= 2.19 to have memory issue when used with cudagraph: NVIDIA/nccl#1234 If you are using multiple users in docker, be sure to check out the doc https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html :
|
@youkaichao THANKS A LOT for the information, I took a look at our config and combine your information, i think i know what happened. We somehow mount a dir at |
Your current environment
🐛 Describe the bug
I am running llama2-70b-chat mode on a 40GiBx8 A100 node.
disable_custom_all_reduce=False, enforce_eager=False
, it works fine;disable_custom_all_reduce=True, enforce_eager=False
, it fails with CUDA OOM;disable_custom_all_reduce=True, enforce_eager=True
, it works fine;Error is linked here: https://gist.github.com/sfc-gh-zhwang/5e4cd04d87a1823a316d983289dfbd21
The text was updated successfully, but these errors were encountered: