[DataFlow runtime 7/7] Integration: launcher + end-to-end equivalence gates by maocheng23 · Pull Request #600 · sgl-project/SpecForge

maocheng23 · 2026-06-24T21:36:31Z

DataFlow runtime — part 7/7 (integration). Stacked on #599 — true-stacked: this PR's base is the previous PR's branch, so the diff below shows only this layer.

Turns the runtime into a thin launcher and adds the end-to-end equivalence gates.

What

specforge/runtime/launch.py — build_offline_eagle3_runtime: assembles OfflineManifestReader → DataFlowController → LocalFeatureStore → FeatureDataLoader → Eagle3TrainStrategy → TrainerController/Core → FSDP.
scripts/train_eagle3_dataflow.py — thin offline launcher; reuses train_eagle3's model/data builders (no training logic in the script).
GPU equivalence gates (CPU-stub-importable, @skipUnless(cuda) for the GPU ones): test_equiv_offline_eagle3 (old run_forward vs new Eagle3TrainStrategy.forward_loss, bit-exact per-batch loss), test_equiv_online_eagle3, test_equiv_trainer_split, test_offline_launch_fsdp, test_checkpoint_resume, test_extraction_vs_hf_reference, plus _fixtures.py.
Two launcher robustness fixes surfaced by a real 7B run: derive args.target_batch_size in the dataflow launcher (it was read before being set → crash); harden destroy_distributed() against None/already-destroyed groups so a successful run does not exit non-zero on teardown.
Docs: runtime/README.md, runtime/ARCHITECTURE.md.

How to run the full 7B old-vs-new offline comparison

# 0) Offline features (.ckpt) — either scripts/prepare_hidden_states.py (sglang),
#    or HF-only: run the target with output_hidden_states and save per prompt
#      input_ids:(seq,)  loss_mask:(seq,)
#      hidden_state:(1,seq,H)      = hidden_states[-1]                (lm-head input)
#      aux_hidden_state:(1,seq,3H) = cat(layers [1, L//2-1, L-4])     (default aux ids)
#    into <i>.ckpt (same format as prepare_hidden_states.DataPoint).
M="Qwen/Qwen2.5-7B-Instruct"; C="configs/qwen2.5-7b-eagle3.json"
ARGS="--target-model-path $M --draft-model-config $C --train-data-path prompts.jsonl \
      --train-hidden-states-path feats/ --target-model-backend hf --chat-template qwen \
      --max-num-steps 200 --batch-size 1 --seed 0"
# old path
torchrun --standalone --nproc_per_node 1 scripts/train_eagle3.py          $ARGS --output-dir out_old
# new path (identical args)
torchrun --standalone --nproc_per_node 1 scripts/train_eagle3_dataflow.py  $ARGS --output-dir out_new
# then diff per-step loss / acc / grad_norm from the two logs.

Results — Qwen2.5-7B, 200 steps, HF backend, seed 0 (offline)

step	old loss / new	old acc / new	old accept / new	old grad / new
1	5.51 / 5.11	0.00 / 0.00	0.11 / 0.11	13.7 / 14.1
100	4.37 / 4.28	0.54 / 0.55	0.21 / 0.19	2.3 / 3.0
200	4.17 / 4.14	0.77 / 0.69	0.24 / 0.23	5.1 / 5.2

Old and new converge to the same point (loss ≈ 4.15, acc ≈ 0.7, acceptance ≈ 0.23, grad ≈ 5). Per-step values are not bit-identical because the two paths iterate samples in different order and report loss slightly differently; test_equiv_offline_eagle3 isolates the per-batch math as bit-exact.

Part of an 8-PR stack adding the DataFlow runtime (M1–M4 + integration). Verified on current main: imports + full tests/test_runtime pass.

gemini-code-assist · 2026-06-24T21:36:33Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

maocheng23 requested review from FlamingoPg, FrankLeeeee, shuaills and sleepcoo as code owners June 24, 2026 21:36

maocheng23 mentioned this pull request Jun 24, 2026

[DataFlow runtime] Online EAGLE3 launcher (build_online_eagle3_runtime + RolloutWorker) #601

Merged

maocheng23 changed the base branch from main to dataflow-up-6-training June 25, 2026 00:15

maocheng23 force-pushed the dataflow-up-7-integration branch 2 times, most recently from ea463fc to d005a13 Compare June 25, 2026 00:57

runtime(7/7): integration — launcher + end-to-end equivalence gates

7a81ce5

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

maocheng23 force-pushed the dataflow-up-7-integration branch from d005a13 to 7a81ce5 Compare June 25, 2026 01:26

jiapingW self-requested a review June 25, 2026 08:49

jiapingW approved these changes Jun 25, 2026

View reviewed changes

jiapingW merged commit 8b4db4f into sgl-project:dataflow-up-6-training Jun 25, 2026
1 check passed

This was referenced Jun 26, 2026

runtime (7/7): integration — launcher + end-to-end equivalence gates maocheng23/SpecForge#8

Closed

[DataFlow runtime] Land integrated stack (M1–M4 + integration + online) into main #603

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DataFlow runtime 7/7] Integration: launcher + end-to-end equivalence gates#600

[DataFlow runtime 7/7] Integration: launcher + end-to-end equivalence gates#600
jiapingW merged 1 commit into
sgl-project:dataflow-up-6-trainingfrom
maocheng23:dataflow-up-7-integration

maocheng23 commented Jun 24, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

maocheng23 commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How to run the full 7B old-vs-new offline comparison

Results — Qwen2.5-7B, 200 steps, HF backend, seed 0 (offline)

Uh oh!

gemini-code-assist Bot commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maocheng23 commented Jun 24, 2026 •

edited

Loading