[DataFlow runtime] Phase B2 — decouple the target engine from the sglang version#632
Merged
jiapingW merged 2 commits intoJul 3, 2026
Merged
Conversation
…ang version
Extract EVERY sglang internal + the duplicated extend/capture forward into one
version-pinned boundary, `sglang_backend/capture.py::SGLangCaptureBackend`, and
have the algorithm engines COMPOSE it instead of embedding it:
SGLangCaptureBackend (the only place that imports sglang.srt.* for capture)
· build() ServerArgs / ModelConfig / SGLangRunner wiring (unified)
· _forward_extend() the single ScheduleBatch/ForwardBatch capture forward
· _maybe_prepare_mlp_sync_batch() ONE (0.5.9) prepare_mlp_sync signature
· extend / extend_vlm / extend_dflash / get_rope_index / set_eagle3_capture_layers
SGLangEagle3TargetEngine / SGLangDFlashTargetEngine now hold a backend and do
only torch-side output shaping — they import ZERO sglang internals (verified by
tests/test_runtime/test_sglang_capture_backend.py, a pure-AST invariant).
Why: before this, both sglang engines imported ~20 sglang symbols and each carried
its own near-duplicate `_extend`; the two copies had drifted to DIFFERENT sglang
API versions (eagle3 = module-level prepare_mlp_sync_batch_raw(attn_cp_size=);
dflash = the removed Scheduler.prepare_mlp_sync_batch_raw(spec_algorithm=)). A
sglang bump touched every subclass and the copies could silently diverge. Now a
bump touches one file; "put the pieces together" (capture backend + shaping +
adapter) instead of tangling the version into each algorithm.
Behavior:
- Byte-identical on the test configs (TP=1/2, dp=1): require_mlp_sync is False so
the unified mlp-sync branch is skipped identically; construction, req building,
the forward, splitting/shard logic, and pool-clear ordering are transplanted
verbatim (`import specforge` stays sglang-optional via lazy import in
from_pretrained; the engine forward is still under @torch.no_grad).
- Two deliberate, flagged changes: (1) DFlash's mlp-sync now uses the same 0.5.9
signature as eagle3 — its old Scheduler.* call was latent-broken for dp>1;
(2) dropped a stray debug print() in DFlash set_capture_layers.
Also adds the `sglang_server` backend (SGLangServerEagle3TargetEngine): selectable
via get_eagle3_target_model(backend="sglang_server"), construction raises an
actionable NotImplementedError until the live-capture depth is set by the O1.3
spike (docs/roadmap/online-disaggregation.md §O1.3).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Base automatically changed from
dataflow-up-24-target-engine
to
dataflow-up-16-zerocopy
July 3, 2026 02:03
jiapingW
approved these changes
Jul 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase B (domain abstractions) — 2/3. Stacked on #631 (B1). This is the core decoupling.
Before this, both
SGLangEagle3TargetEngineandSGLangDFlashTargetEngineimported ~20 sglang internals directly and each carried its own near-duplicate_extendforward — the two copies had even drifted to different sglang API versions (eagle3 = module-levelprepare_mlp_sync_batch_raw(attn_cp_size=); dflash = the removedScheduler.prepare_mlp_sync_batch_raw(spec_algorithm=)). A sglang bump touched every subclass and the copies could silently diverge.This extracts every sglang internal + the single capture forward into one version-pinned boundary,
sglang_backend/capture.py::SGLangCaptureBackend; the algorithm engines now compose it and import zero sglang (enforced by a pure-AST test). A sglang bump now touches one file. Net −592 lines in the engine files.Behavior:
require_mlp_syncis False so the unified mlp-sync branch is skipped identically; construction, req building, the forward, split/shard logic, and pool-clear ordering are transplanted verbatim.import specforgestays sglang-optional (lazy import infrom_pretrained).Scheduler.*call was latent-broken for dp>1; (2) dropped a stray debugprint()in DFlashset_capture_layers.Also adds the
sglang_serverbackend (SGLangServerEagle3TargetEngine): selectable via the factory, construction raises an actionableNotImplementedErroruntil the live-capture depth is set by the O1.3 spike (docs/roadmap/online-disaggregation.md§O1.3).New test:
tests/test_runtime/test_sglang_capture_backend.py(AST decoupling invariant +sglang_serverselectability).Validation
Full
tests/test_runtime214 OK (2 skip, 1 xfail) on 8×H200; the hf-vs-sglang-vs-custom capture parity test builds a real SGLang runner throughSGLangCaptureBackendat TP=2 and matches the HF reference — 2 OK. Adversarial review: 0 confirmed defects.🤖 Generated with Claude Code