Description
Description
In our quest against database corruption, I think we should start calling fsync
in more places. The problem is that this could cause serious performance degradation if we do it too liberally.
This PR for tree-states (#5144) adds some fsyncs to prevent issues during backfill. I suspect that fsync will be most useful when coordinating changes across multiple databases e.g. hot & freezer, or hot & blobs. On stable
we currently only have one synchronisation point, which is during the hot -> cold database migration:
lighthouse/beacon_node/store/src/hot_cold_store.rs
Lines 2487 to 2498 in c7e5dd1
This synchronisation was added by @adaszko, and I reckon it has probably saved us from many more instances of DB corruption.
Steps to resolve
- Add fsyncs during all database transactions that involve >1 database. Likely candidates for a re-work are
do_atomically_with_block_and_blobs_cache
and backfill (the tree-states PR). - Measure the impact of these
fsync
additions on performance. For backfill the metric is #blocks/sec speed, although this is a little hard to control when network factors also play a role (we could import Era files to remove this variability). - (Optional) Consider adding more
fsync
s after database operations involving a single database. The "early attester cache" in block processing should mean that it's OK for us to spend a little more time on the database write (it is off the hot path).