Skip to content

[DataFlow runtime] Phase B2 — decouple the target engine from the sglang version#632

Merged
jiapingW merged 2 commits into
dataflow-up-16-zerocopyfrom
dataflow-up-25-sglang-capture-backend
Jul 3, 2026
Merged

[DataFlow runtime] Phase B2 — decouple the target engine from the sglang version#632
jiapingW merged 2 commits into
dataflow-up-16-zerocopyfrom
dataflow-up-25-sglang-capture-backend

Conversation

@maocheng23

Copy link
Copy Markdown
Collaborator

Phase B (domain abstractions) — 2/3. Stacked on #631 (B1). This is the core decoupling.

Before this, both SGLangEagle3TargetEngine and SGLangDFlashTargetEngine imported ~20 sglang internals directly and each carried its own near-duplicate _extend forward — the two copies had even drifted to different sglang API versions (eagle3 = module-level prepare_mlp_sync_batch_raw(attn_cp_size=); dflash = the removed Scheduler.prepare_mlp_sync_batch_raw(spec_algorithm=)). A sglang bump touched every subclass and the copies could silently diverge.

This extracts every sglang internal + the single capture forward into one version-pinned boundary, sglang_backend/capture.py::SGLangCaptureBackend; the algorithm engines now compose it and import zero sglang (enforced by a pure-AST test). A sglang bump now touches one file. Net −592 lines in the engine files.

Behavior:

  • Byte-identical on the test configs (TP=1/2, dp=1): require_mlp_sync is False so the unified mlp-sync branch is skipped identically; construction, req building, the forward, split/shard logic, and pool-clear ordering are transplanted verbatim. import specforge stays sglang-optional (lazy import in from_pretrained).
  • Two deliberate, flagged changes: (1) DFlash mlp-sync unified onto the eagle3 0.5.9 signature — its old Scheduler.* call was latent-broken for dp>1; (2) dropped a stray debug print() in DFlash set_capture_layers.

Also adds the sglang_server backend (SGLangServerEagle3TargetEngine): selectable via the factory, construction raises an actionable NotImplementedError until the live-capture depth is set by the O1.3 spike (docs/roadmap/online-disaggregation.md §O1.3).

New test: tests/test_runtime/test_sglang_capture_backend.py (AST decoupling invariant + sglang_server selectability).

Validation

Full tests/test_runtime 214 OK (2 skip, 1 xfail) on 8×H200; the hf-vs-sglang-vs-custom capture parity test builds a real SGLang runner through SGLangCaptureBackend at TP=2 and matches the HF reference — 2 OK. Adversarial review: 0 confirmed defects.

🤖 Generated with Claude Code

…ang version

Extract EVERY sglang internal + the duplicated extend/capture forward into one
version-pinned boundary, `sglang_backend/capture.py::SGLangCaptureBackend`, and
have the algorithm engines COMPOSE it instead of embedding it:

  SGLangCaptureBackend  (the only place that imports sglang.srt.* for capture)
    · build()            ServerArgs / ModelConfig / SGLangRunner wiring (unified)
    · _forward_extend()  the single ScheduleBatch/ForwardBatch capture forward
    · _maybe_prepare_mlp_sync_batch()  ONE (0.5.9) prepare_mlp_sync signature
    · extend / extend_vlm / extend_dflash / get_rope_index / set_eagle3_capture_layers

  SGLangEagle3TargetEngine / SGLangDFlashTargetEngine  now hold a backend and do
  only torch-side output shaping — they import ZERO sglang internals (verified by
  tests/test_runtime/test_sglang_capture_backend.py, a pure-AST invariant).

Why: before this, both sglang engines imported ~20 sglang symbols and each carried
its own near-duplicate `_extend`; the two copies had drifted to DIFFERENT sglang
API versions (eagle3 = module-level prepare_mlp_sync_batch_raw(attn_cp_size=);
dflash = the removed Scheduler.prepare_mlp_sync_batch_raw(spec_algorithm=)). A
sglang bump touched every subclass and the copies could silently diverge. Now a
bump touches one file; "put the pieces together" (capture backend + shaping +
adapter) instead of tangling the version into each algorithm.

Behavior:
- Byte-identical on the test configs (TP=1/2, dp=1): require_mlp_sync is False so
  the unified mlp-sync branch is skipped identically; construction, req building,
  the forward, splitting/shard logic, and pool-clear ordering are transplanted
  verbatim (`import specforge` stays sglang-optional via lazy import in
  from_pretrained; the engine forward is still under @torch.no_grad).
- Two deliberate, flagged changes: (1) DFlash's mlp-sync now uses the same 0.5.9
  signature as eagle3 — its old Scheduler.* call was latent-broken for dp>1;
  (2) dropped a stray debug print() in DFlash set_capture_layers.

Also adds the `sglang_server` backend (SGLangServerEagle3TargetEngine): selectable
via get_eagle3_target_model(backend="sglang_server"), construction raises an
actionable NotImplementedError until the live-capture depth is set by the O1.3
spike (docs/roadmap/online-disaggregation.md §O1.3).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@maocheng23 maocheng23 marked this pull request as ready for review July 1, 2026 08:22
Base automatically changed from dataflow-up-24-target-engine to dataflow-up-16-zerocopy July 3, 2026 02:03
@jiapingW jiapingW self-requested a review July 3, 2026 02:12
@jiapingW jiapingW merged commit ac8f878 into dataflow-up-16-zerocopy Jul 3, 2026
1 check passed
@jiapingW jiapingW deleted the dataflow-up-25-sglang-capture-backend branch July 3, 2026 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants