Skip to content

[DataFlow runtime · M6 3/4] M5/M6 adversarial-review hardening#611

Merged
jiapingW merged 1 commit into
dataflow-up-11-m5-recoveryfrom
dataflow-up-14-hardening
Jun 29, 2026
Merged

[DataFlow runtime · M6 3/4] M5/M6 adversarial-review hardening#611
jiapingW merged 1 commit into
dataflow-up-11-m5-recoveryfrom
dataflow-up-14-hardening

Conversation

@maocheng23

@maocheng23 maocheng23 commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

M5/M6 adversarial-review hardening (LocalFeatureStore + control plane).

  • LocalFeatureStore: gc() counts freed_bytes on a successful release-pending retry; lease registered under the existence-check lock; release() cleanup.
  • SQLiteMetadataStore: synchronous=FULL; all_committed_ids ORDER BY rowid.
  • Controller: reconcile_on_restart gates release on optimizer_durable (not global_step is not None).
  • test_equiv_4rank: attention_backend='usp' + flash-attn skip guard.
  • Regression tests in test_feature_store / test_recovery.

The SharedDirFeatureStore per-generation B5 rewrite that was previously bundled in this PR now lives in #609 alongside the store it hardens. (Hot-update / weight-version-control intentionally NOT included.)

Part of the DataFlow runtime M5/M6 stacked series (continues the M1–M4 work in #594#601 / #603). Stacked PRs — merge bottom-up (up-9 first). Lint (pre-commit) + runtime CPU test suite green.

🤖 Generated with Claude Code

@maocheng23 maocheng23 requested a review from FrankLeeeee as a code owner June 28, 2026 00:33
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…ility, reconcile gate, USP 4-rank test

- LocalFeatureStore: gc() counts freed_bytes on a successful release-pending
  retry; release() sid hoist + comment; get() lock-scope comment.
- SQLiteMetadataStore: synchronous=FULL; all_committed_ids ORDER BY rowid.
- controller.reconcile_on_restart gates release on optimizer_durable (not
  global_step is not None).
- test_equiv_4rank uses attention_backend='usp' + a flash-attn skip guard.
- regression tests in test_feature_store / test_recovery.

(The disaggregated.py per-generation rewrite that was previously bundled here now
lives in #609 alongside the SharedDirFeatureStore it hardens.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@maocheng23 maocheng23 force-pushed the dataflow-up-13-disagg-example branch from e7bb5e1 to 31b823f Compare June 29, 2026 03:57
@maocheng23 maocheng23 force-pushed the dataflow-up-14-hardening branch from c39507b to 3a2cc7f Compare June 29, 2026 03:57
Base automatically changed from dataflow-up-13-disagg-example to dataflow-up-11-m5-recovery June 29, 2026 16:03
@jiapingW jiapingW merged commit 3af3ec8 into dataflow-up-11-m5-recovery Jun 29, 2026
1 check passed
@jiapingW jiapingW deleted the dataflow-up-14-hardening branch June 29, 2026 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants