Skip to content

[DataFlow runtime] DFlash end-to-end on the composable launch (offline + online)#628

Merged
jiapingW merged 1 commit into
dataflow-up-16-zerocopyfrom
dataflow-up-22-dflash
Jul 2, 2026
Merged

[DataFlow runtime] DFlash end-to-end on the composable launch (offline + online)#628
jiapingW merged 1 commit into
dataflow-up-16-zerocopyfrom
dataflow-up-22-dflash

Conversation

@maocheng23

Copy link
Copy Markdown
Collaborator

What

DFlash trains end-to-end (offline + online) through the composable launch from the parent PR — via a StrategySpec entry + a DFlashAdapter, with ZERO launch.py changes.

Changes

  • registry.py: dflash spec — offline reader (OfflineManifestReader with dflash feature_keys, no aux/target swap), per-sample transform, padding collate; online via DFlashAdapter; supports_online=True.
  • specforge/runtime/inference/dflash_adapter.py (new): wraps generate_dflash_data, emits {input_ids, hidden_states, loss_mask}. verify_capture self-skips the eagle3 aux/target checks (different feature names + __aux_layer_ids__=None).
  • tests/_fixtures.py: write_offline_files_dflash + build_dflash (tiny Qwen3 target → DFlash draft + TargetEmbeddingsAndHeadOnlineDFlashModel).
  • tests/test_dflash_launch.py + test_dflash_online_launch.py (new, GPU): offline and online dflash train end-to-end through FSDP.

Note

DFlash is online-only in production today (no offline dumper exists — prepare_hidden_states.py is eagle3-only), so the offline path is exercised with synthetic fixtures while online is its real workflow.

Testing

Part of the 197 tests OK suite run at the stack tip (sci-h200 / H200).

Stacked on the composable-launch PR. Part 2/3.

🤖 Generated with Claude Code

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

maocheng23 added a commit that referenced this pull request Jun 30, 2026
… W3′ naming

Review fixes (verified against the files):
- Status (confirmed): stop calling the in-review composable-launch stack (#627/#628/#629)
  "landed"/"DONE"/"done". Split the genuinely-merged spine from the in-review stack in §1; one
  consistent "in review" label in §1/Phase A/success table and across the roadmap (README, Phase A).
  Leave the spine's "landed" wording (it is merged).
- Module placement (confirmed): Evaluator/EvalCache are top-level domain managers
  (specforge/eval/), not specforge/runtime/eval/ — fix the eval-and-breadth.md outlier to match
  plan.md §2.3 and domain-refactor.md.
- W3′ naming (confirmed): SGLangServerEngine is ONE engine with two feature transports
  (capture-into-FeatureStore for W3/O1.3, inline-HTTP for the light W3′) — disambiguate in §2.2,
  the workload table and §G2 rather than overloading one name.
- O1.3 spike (reviewer's premise refuted — it is already an explicit 🔴 gate): added the valid
  narrow point instead — the spike scopes only the sglang_server slice of Phase B; the de-EAGLE3
  extraction and domain Trainer carry no engine risk.

Additional contradictions found by a completeness sweep and fixed:
- StrategySpec registry: plan.md said it "stays in runtime/training unchanged" but §6 + Phase E
  move it — clarify the per-step strategy seam stays, the registry converges into training/strategies/.
- TargetEngine source: extracted from modeling/target/*TargetModel (adapters wrap it), not
  "absorbs runtime/inference adapters".
- Draft package: models/drafts is the target layout; note today's modeling/draft/ + real filenames.
- Dependency graph: align domain-refactor (E depended on {C,D}) with README (D→E, C parallel).
- Drop the up-19/up-20 branch tags that only appeared in the online doc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@maocheng23 maocheng23 marked this pull request as ready for review June 30, 2026 23:26
@maocheng23 maocheng23 requested a review from FrankLeeeee as a code owner June 30, 2026 23:26
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@maocheng23

Copy link
Copy Markdown
Collaborator Author

Code review

No high-confidence issues found. Checked for bugs in the DFlash adapter, offline transform/collate path, strategy registration, and online rollout-to-train wiring.

…e + online)

DFlash now trains through the runtime via a StrategySpec + a DFlashAdapter, with
ZERO launch.py changes (the spec seam from the previous commit carries it).

- registry.py: dflash spec — offline reader (OfflineManifestReader with dflash
  feature_keys, no aux/target swap), per-sample transform, padding collate; online
  via DFlashAdapter; supports_online=True.
- inference/dflash_adapter.py (new): wraps generate_dflash_data, emits
  {input_ids, hidden_states, loss_mask}; verify_capture self-skips the eagle3
  aux/target checks (different feature names + __aux_layer_ids__=None).
- tests/_fixtures.py: write_offline_files_dflash + build_dflash (tiny Qwen3 target
  -> DFlash draft + TargetEmbeddingsAndHead -> OnlineDFlashModel).
- tests/test_dflash_launch.py + test_dflash_online_launch.py (new, GPU): offline
  and online dflash train end-to-end through FSDP.
- tests/test_strategy_registry.py: dflash-fully-wired assertions.

DFlash is online-only in production (no offline dumper exists yet — prepare_
hidden_states.py is eagle3-only), so the offline path is exercised with synthetic
fixtures while online is its real workflow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@maocheng23 maocheng23 force-pushed the dataflow-up-21-composable-launch branch from 457b20a to 8faf111 Compare July 1, 2026 00:29
@maocheng23 maocheng23 force-pushed the dataflow-up-22-dflash branch from 6945bfe to a6b8b7d Compare July 1, 2026 00:29
@maocheng23

Copy link
Copy Markdown
Collaborator Author

Addressed review feedback (self-review pass).

  • dflash_adapter.py: dropped the redundant \"__aux_layer_ids__\": None emit (the RolloutWorker reads it via feats.pop(..., None), so an absent key is identical); made the loss_mask default lazy (only allocate the all-ones mask when a task omits it, not eagerly per sample).

Deferred: extracting the length-grouped batching shared with SGLangAdapter into a base helper — it touches the pre-existing SGLangAdapter / validated eagle3 online path, so noted for a follow-up.

Validated: full tests/test_runtime = 200 OK (2 skipped, 1 xfail), zero failures, on a 2-node H200 pod. Lint clean (black 24.10.0 / isort 5.13.2 / autoflake).

Base automatically changed from dataflow-up-21-composable-launch to dataflow-up-16-zerocopy July 2, 2026 05:41
@jiapingW jiapingW merged commit 14a18ba into dataflow-up-16-zerocopy Jul 2, 2026
1 check passed
@jiapingW jiapingW deleted the dataflow-up-22-dflash branch July 2, 2026 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants