-
Notifications
You must be signed in to change notification settings - Fork 690
[Feature] Staggered batch scheduling for DP+EP #5558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
This reverts commit 0528baf.
|
Thanks for your contribution! |
|
gzz seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5558 +/- ##
==========================================
Coverage ? 65.69%
==========================================
Files ? 330
Lines ? 41986
Branches ? 6451
==========================================
Hits ? 27581
Misses ? 12358
Partials ? 2047
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
In MoE inference system, requests arrive DP instances at different timepoints, which causes some DP instances make an idle forward, thereby reducing resource utilization and increasing TTFT
Modifications
To mitigate the above issue, we adopt a batching strategy on both scheduler side and engine side, keeping all DP instances as fully utilized as possible
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.