You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider optimizing the FastAPI/OpenAI API server in vLLM as the server is widely used and seems to have a lot of overhead. On 1xA100 Llama 13B, the LLM class reaches 90~100% GPU utilization, while the API server can only utilize 50%