Skip to content

Add Splitwise: prompt and token phase separation #2472

Closed as not planned
Closed as not planned
@goiri

Description

@goiri

We have built the system described in http://aka.ms/splitwise
Splitwise splits the prompt and token phases to run in different servers.
This leverages the differences between these two phases to improve throughput.
We have an internal prototype on top of an internal vLLM branch.
This issue tracks the effort to open source this prototype and make it part of the official vLLM.

This includes:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions