Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's up with Pipeline Parallelism? #3314

Open
duanzhaol opened this issue Mar 11, 2024 · 4 comments
Open

What's up with Pipeline Parallelism? #3314

duanzhaol opened this issue Mar 11, 2024 · 4 comments

Comments

@duanzhaol
Copy link

duanzhaol commented Mar 11, 2024

Hey vllm team,

Hope you're all doing great! I‘m focusing on pipeline parallel inference and I hope it can be support on vllm.

I noticed that pipeline parallelism was on the old roadmap(#244) , but it's not on the new roadmap(#2681). Just curious, was there a specific reason you guys decided to skip it for now? Challenges with the implementation, or maybe it just didn't fit into the grand scheme of things at the moment?

Would love to get any insights or thoughts you have on this. I'm really looking forward to seeing where you take vllm next!

@simon-mo
Copy link
Collaborator

Currently we observe that the performance of Tensor Parallelism is more desirable than pipeline parallelism. Due to the lack of bandwidth, we dropped it from the current roadmap. We still welcome contribution!

@duanzhaol
Copy link
Author

Currently we observe that the performance of Tensor Parallelism is more desirable than pipeline parallelism. Due to the lack of bandwidth, we dropped it from the current roadmap. We still welcome contribution!

Thanks,I believe that Pipeline Parallelism may offer improved throughput compared to Tensor Parallelism, albeit with a trade-off in latency. In certain situations, this approach could indeed be more practical. Additionally, I am currently working on implementing an asynchronous version of Pipeline Parallelism, which I can make a PR upon completion.

@rkooo567
Copy link
Collaborator

Our internal work shows PP is actually help improving throughput of prefill stage because of low communication cost. I am excited to see the proposal!

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants