Open
Description
Anything you want to discuss about vllm.
Hi there, lately we've been testing some distributed features of vLLM, when we deal with pipeline parallelism, we find an interesting thing that the system would automatically and evenly distribute the requests to multiple micro batches, which is a good way to avoid pipeline bubble. But we couldn't find related logic from the source codes. Can you explain where or in which component does this procedure actually happen?
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.