What's up with Pipeline Parallelism? #3314

duanzhaol · 2024-03-11T08:14:32Z

Hey vllm team,

Hope you're all doing great! I‘m focusing on pipeline parallel inference and I hope it can be support on vllm.

I noticed that pipeline parallelism was on the old roadmap（#244） , but it's not on the new roadmap（#2681）. Just curious, was there a specific reason you guys decided to skip it for now? Challenges with the implementation, or maybe it just didn't fit into the grand scheme of things at the moment?

Would love to get any insights or thoughts you have on this. I'm really looking forward to seeing where you take vllm next!

simon-mo · 2024-03-12T20:40:16Z

Currently we observe that the performance of Tensor Parallelism is more desirable than pipeline parallelism. Due to the lack of bandwidth, we dropped it from the current roadmap. We still welcome contribution!

duanzhaol · 2024-03-13T02:09:40Z

Currently we observe that the performance of Tensor Parallelism is more desirable than pipeline parallelism. Due to the lack of bandwidth, we dropped it from the current roadmap. We still welcome contribution!

Thanks，I believe that Pipeline Parallelism may offer improved throughput compared to Tensor Parallelism, albeit with a trade-off in latency. In certain situations, this approach could indeed be more practical. Additionally, I am currently working on implementing an asynchronous version of Pipeline Parallelism, which I can make a PR upon completion.

rkooo567 · 2024-03-13T13:08:55Z

Our internal work shows PP is actually help improving throughput of prefill stage because of low communication cost. I am excited to see the proposal!

github-actions · 2024-10-30T02:00:15Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

esmeetu added the feature request label Mar 14, 2024

taoluo mentioned this issue Mar 22, 2024

Add Sarathi-Serve support in vLLM #3121

Draft

github-actions bot added the stale label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's up with Pipeline Parallelism? #3314

What's up with Pipeline Parallelism? #3314

duanzhaol commented Mar 11, 2024 •

edited

Loading

simon-mo commented Mar 12, 2024

duanzhaol commented Mar 13, 2024

rkooo567 commented Mar 13, 2024

github-actions bot commented Oct 30, 2024

What's up with Pipeline Parallelism? #3314

What's up with Pipeline Parallelism? #3314

Comments

duanzhaol commented Mar 11, 2024 • edited Loading

simon-mo commented Mar 12, 2024

duanzhaol commented Mar 13, 2024

rkooo567 commented Mar 13, 2024

github-actions bot commented Oct 30, 2024

duanzhaol commented Mar 11, 2024 •

edited

Loading