-
-
Notifications
You must be signed in to change notification settings - Fork 7.3k
[Spec Decode] Make speculative decoding compatible with pipeline parallelism #15173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
005e1a9
to
ecd9f20
Compare
Any plan to merge it? i am looking for similar feature for R1 with PP + TP + MTP. |
This pull request has merge conflicts that must be resolved before it can be |
e138dd7
to
78f283f
Compare
…llelism Signed-off-by: Xin Yang <xyangx@amazon.com>
c43f156
to
74137f8
Compare
I trust this message finds you in good spirits. I am writing to commend the exceptional work that has been accomplished and to seek clarification on a couple of important matters.
|
0076f69
to
6ca6537
Compare
4e4c85c
to
061ee3d
Compare
|
…llelism Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Thank you for the PR. Given we have turned on V1 by default, we would like any PR to work in V1 when they are merged. However, I do recognize that V1 doesn't support draft model yet. Leaving that to @WoosukKwon to decide and @LiuXiaoxuanPKU and @ruisearch42 to review. |
will you support the case where tp>1? In some scenarios, the draft model will also be huge. |
This pull request has merge conflicts that must be resolved before it can be |
@xyang16, thanks for thre PR! Can you please share the parameters that you used to run DeepSeek on two-node system? First of all, the build of the PR fails on my end. I rebased your branch and the build is fine now but vllm fails at start.
|
This PR aims to make speculative decoding compatible with pipeline parallelism.
Sample commands:
Benchmark
We benchmarked deepseek-ai/DeepSeek-R1 model with a private eagle model in a 2-node cluster. With pipeline parallel enabled (tp=8, pp=2), there's more than 30% tps improvement compared with baseline (tp=16, pp=1).
cc @LiuXiaoxuanPKU @comaniac Would appreciate your review. Thanks!