Skip to content

[Medium] Relayer state persistence lacks fsync — crash may silently discard last checkpoint #165

@AshinGau

Description

@AshinGau

Description

RelayerState::save uses fs::write(temp) + fs::rename(temp, final). This provides atomic replacement but no durability: neither the file contents nor the parent directory are fsynced. On power loss between write and rename, or after rename but before the parent directory entry reaches disk, the checkpoint can be silently lost.

Affected Files

  • crates/pipe-exec-layer-ext-v2/relayer/src/persistence.rs:77-97 (RelayerState::save)

Exploit Scenario

A node crashes after poll_uri returns updated=true and the relayer has already delivered oracle events downstream (nonce consumed by JWK oracle). On restart, on-disk relayer_state.json reverts to a pre-update version; StartupScenario::Restore replays the same events. The module's exactly-once claim (blockchain_source.rs:89) is violated under power-loss scenarios.

Impact

Lost durability under crash/power-loss; combines with the "in-memory advances despite save failure" issue to expand the replay window. Any consumer that trusts exactly-once from this relayer can be driven to double-process.

Fix

After fs::write(&temp_path, …), open the file and call sync_all(). After rename, open the parent directory and fsync it to ensure the rename itself is durable. Consider using reth_fs_util helpers for consistent error handling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions