You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Misc]: Curious why this is happening: Running phi-3-vision on a RTX 3070 (8GB VRAM) works with transformer but not with vllm (goes out of memory)
#5883
Closed
chandeldivyam opened this issue
Jun 27, 2024
· 4 comments
· Fixed by #5887
I was wondering why does this happen? I am new to this space and was playing around with different machines, models and frameworks.
I am able to inference single image (on RTX3070) in around 70s using huggingface transformer. Tried similar thing using vllm (current main branch), it got out of memory which got me curious.
Thanks to @ywang96 we have figured out the reason. The model has 128k context length by default so it might not fit in your GPU. Try passing max_model_len=8192 (or some other value that lets it fit in your GPU) to LLM in the example.
Anything you want to discuss about vllm.
I was wondering why does this happen? I am new to this space and was playing around with different machines, models and frameworks.
I am able to inference single image (on RTX3070) in around 70s using huggingface transformer. Tried similar thing using vllm (current main branch), it got out of memory which got me curious.
Vllm
The text was updated successfully, but these errors were encountered: