Skip to content

[Misc]: How does the system evenly distribute the requests to multiple micro batches? #14213

Open
@oldcpple

Description

@oldcpple

Anything you want to discuss about vllm.

Hi there, lately we've been testing some distributed features of vLLM, when we deal with pipeline parallelism, we find an interesting thing that the system would automatically and evenly distribute the requests to multiple micro batches, which is a good way to avoid pipeline bubble. But we couldn't find related logic from the source codes. Can you explain where or in which component does this procedure actually happen?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    miscstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions