-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: A bug when running examples/llava_example.py with image_features as input and multiple GPUs enabled #5863
Comments
@xwjiang2010 is this related to sharding being disabled for vision tower? Personally I haven't run into this problem with tp=2 but it may become an issue with other tp values. |
Hi @DarkLight1337 , Thanks for the comments, for me, even tp=2 doesn't work, if you use |
Just realized that the problem is specific to |
Using image_features can bypass the limitation of only allowing 1 image for the input. In other words, we can pass the image_features for multiple images. To fix this bug, my understanding is to change this line https://github.com/vllm-project/vllm/blob/v0.5.0.post1/vllm/model_executor/models/llava.py#L218 to sth like device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
image_features = image_input["data"].to(device) Any idea? Thanks! |
I doubt this would work since the error is raised before the model is even called. It seems to be related to how multi modal data is sent to the individual workers by the model executor. |
Hi @DarkLight1337 @yinsong1986 @DarkLight1337 since you are doing the input processing stuff, can you make sure that this bug is fixed for all the 3 models we support? Thank you! For my future reference, I am using main branch at commit 67005a0. Using head will give me shm errors. I believe that is introduced by the recent shm backend changes and is irrelevant for vlm testing. |
@yinsong1986 Another note, as @DarkLight1337 mentioned, we are removing the support of image_features and pixel_values in an effort to make API more user friendly. Instead we focus support on PIL.Image.Image and embeddings. |
Where should I put this code? I thought the tensors were already moved to the correct device inside the model runner? |
@DarkLight1337 yes, I would think model_runner already did that. But apparently that's not the case. Clearly a bug somewhere. I will spend 1-2 hours today to look into it. |
Hi @yinsong1986! We have discussed offline regarding this bug - there's some ongoing effort #5851 already on removing the support for image features, and eventually we will support image embeddings instead. The difference here is that image features still need to go through a multi-modal projector to generate the image embeddings to be consumed by the language model, and having a clear cut of whether or not the vision encoder path is needed in the forward pass is better in our opinion. We will also investigate and fix the bug why TP didn't work, but this won't be done for image features but in a later PR when we add the support for image embeddings |
Thanks @DarkLight1337 @xwjiang2010 Look forward to the changes! Also, FYI: I tested, as a work around to support image_features for tp>1, we can change this line https://github.com/vllm-project/vllm/blob/v0.5.0.post1/vllm/model_executor/models/llava.py#L218 to sth like device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
image_features = image_input["data"].to(device) It could kind of run successfully with tp=8. Cheers! |
same run conmmad:
|
The fix will be included in the upcoming release. You can build from source ( |
I'm really looking forward to it and can't wait.
|
Your current environment
🐛 Describe the bug
When following the example https://github.com/vllm-project/vllm/blob/v0.5.0.post1/examples/llava_example.py and enabled multiple GPUs as below:
and used
image_features
as the argument of--type
, and ranpython example/llava_example.py --type image_features
, it would report errors like below:The text was updated successfully, but these errors were encountered: