Skip to content

Consider optimizing the API server #580

Closed
@imoneoi

Description

@imoneoi

Consider optimizing the FastAPI/OpenAI API server in vLLM as the server is widely used and seems to have a lot of overhead. On 1xA100 Llama 13B, the LLM class reaches 90~100% GPU utilization, while the API server can only utilize 50%

Related: #459

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance-related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions