Skip to content

Revert "fix: resolve some issues in cross region test"#186

Draft
chee-chyuan wants to merge 1 commit into
develop-v2.2from
develop-v2.2-without-cross-region
Draft

Revert "fix: resolve some issues in cross region test"#186
chee-chyuan wants to merge 1 commit into
develop-v2.2from
develop-v2.2-without-cross-region

Conversation

@chee-chyuan
Copy link
Copy Markdown

This reverts commit ba2f60c.

Description

add a description of your changes here...

Rationale

tell us why we need these changes...

Example

add an example CLI or API response...

Changes

Notable changes:

  • add each change in a bullet point here
  • ...

Potential Impacts

  • add potential impacts for other components here
  • ...

@github-actions
Copy link
Copy Markdown
Contributor

Your PR title doesn't follow the Conventional Commit guidelines.

Example of valid titles:

  • feat: add new user login
  • fix: correct button size
  • docs: update README

Usage:

  • feat: Introduces a new feature
  • fix: Patches a bug
  • chore: General maintenance tasks or updates
  • test: Adding new tests or modifying existing tests
  • bench: Adding new benchmarks or modifying existing benchmarks
  • perf: Performance improvements
  • refactor: Changes to improve code structure
  • docs: Documentation updates
  • ci: Changes to CI/CD configurations
  • revert: Reverts a previously merged PR
  • deps: Updates dependencies

Breaking Changes

Breaking changes are noted by using an exclamation mark. For example:

  • feat!: changed the API
  • chore(node)!: Removed unused public function

Help

For more information, follow the guidelines here: https://www.conventionalcommits.org/en/v1.0.0/

@hashdit-bot
Copy link
Copy Markdown

hashdit-bot Bot commented May 11, 2026

Pull Request Review

This PR is a broad revert of a previous “cross region test” fix, rolling back multiple reliability and consistency changes across engine persistence, startup alignment, peer error handling, and blobstore behavior. It reduces graceful shutdown timeout back to 5s, removes TrieDB/MDBX alignment safeguards and deferred flush ordering, reclassifies UselessPeer back to fatal in network error handling, and deletes a number of regression/characterization tests. It also restores blobstore startup behavior to delete all existing on-disk blobs on open.

Sensitive Content

No sensitive content detected.

Security Issues

🔴 [CRITICAL] Risk of state inconsistency/corruption due to reverted TrieDB↔MDBX ordering/alignment protections

File: crates/storage/provider/src/providers/database/provider.rs, crates/engine/tree/src/persistence.rs, crates/node/builder/src/launch/common.rs
The revert removes key protections that maintained the invariant between MDBX canonical state and TrieDB pathdb state (including startup alignment checks/unwind logic and deferred post-commit triedb flush handling). This can reintroduce crash-window inconsistencies where one backend advances without the other, creating unrecoverable or hard-to-recover chain-state divergence conditions after restart/reorg.
Recommendation: Reintroduce strict backend ordering invariants and startup reconciliation checks (including bounded unwind/root validation), and ensure flush/commit sequencing is crash-safe with explicit tests for restart + reorg scenarios.

🟠 [HIGH] Destructive blobstore initialization causes on-start data loss

File: crates/transaction-pool/src/blobstore/disk.rs
DiskFileBlobStore::open() now unconditionally calls inner.delete_all()?, wiping all persisted sidecars at startup. This can break protocol-serving expectations for retained blob sidecars and may degrade node behavior/availability after restart by discarding required recent data.
Recommendation: Respect configured open mode (Clear vs ReIndex) and avoid unconditional deletion; default to non-destructive startup with explicit operator opt-in for clearing.

🟠 [HIGH] Graceful shutdown regression may truncate persistence window

File: crates/cli/runner/src/lib.rs, crates/node/builder/src/launch/engine.rs
The graceful shutdown timeout was reduced from 60s to 5s and shutdown orchestration safeguards were simplified/reverted. Under production load or slower disks, this increases likelihood of incomplete flush/persist during termination, risking data loss or inconsistent on-disk state.
Recommendation: Restore longer default timeout and robust single-terminate/shutdown completion handling; keep/restore signal-path tests that verify persistence completion under realistic timing.


Generated by Hashdit Bot. This tool can absolutely NOT replace manual audits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant