-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDiff in the hot DB #39
base: unstable
Are you sure you want to change the base?
Conversation
3025158
to
8eaee8d
Compare
state_root, | ||
state_slot: Some(summary.slot), | ||
// Delete the diff as descendants of this state will never be used. | ||
prune_hot_diff: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to change this to not delete the diff ancestors of the finalized state (use closest_layer_slots
somewhere here).
We need to know the ancestor diffs of the new finalized state, so one approach would be:
- Reorder
migrate_database
(hot to freezer migration) prior to pruning heads - Keep hot states which are either after the split, or before the split and descended from it according to the diff hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to rewrite the prune_abandoned_forks
code, but if we don't want to do that now, we can take the approach above
Regarding state summaries stored during state advance (infamous Option 1 (current strategy) is to commit eagerly:
Option 2 (old strategy from ages ago) is to commit on import:
Option 3 (don't store intermediate states at all):
|
dcce765
to
0a230ec
Compare
.viable_heads::<T::EthSpec>(head_slot) | ||
.iter() | ||
.map(|node| (node.root, node.slot)) | ||
.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use to implement the DB downgrade (recreate the headtracker after it has been deleted)
// TODO(hdiff): Is it necessary to do this read tx now? Also why is it necessary to | ||
// check that the summary exists at all? Are double writes common? Can this txn | ||
// lock deadlock with the `do_atomically` call? | ||
let txn_lock = chain.store.hot_db.begin_rw_transaction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this lock is a race with the write in BeaconChain::import_block
:
If we were to write a non-temporary state in import_block
in between setting state_already_exists
(false
) and the write of the temporary state in this function, we can corrupt the DB:
- There is a state that is not temporary (required by some fully-imported block),
- But it is marked temporary due to the race condition here
- Temporary states risk being deleted by pruning -> invalid DB due to deletion of canonical state.
The lock prevents this case by preventing the interleaving of the read in this function with the write in the import function. However this is a bad abstraction forced upon us by LevelDB which lacks proper ACID transactions. If we move away from LevelDB eventually we can maybe have proper transactions, see:
@@ -735,11 +639,14 @@ impl<E: EthSpec, Hot: ItemStore<E>, Cold: ItemStore<E>> BackgroundMigrator<E, Ho | |||
store.pruning_checkpoint_store_op(new_finalized_checkpoint), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets delete the pruning_checkpoint while we're here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in c8d82f0
StateSummariesDAG::new(state_summaries, parent_block_roots) | ||
}; | ||
|
||
// From the DAG compute the list of roots that ascend from finalized root up to the split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// From the DAG compute the list of roots that ascend from finalized root up to the split | |
// From the DAG compute the list of roots that descend from finalized root up to the split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with 4fa32ccc8
}; | ||
|
||
// From the DAG compute the list of roots that ascend from finalized root up to the split | ||
// slot. And run `migrate_database` with it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// slot. And run `migrate_database` with it | |
// slot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with 4fa32cc
// keep those on the finalized canonical chain. Note that there may be lingering | ||
// forks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// keep those on the finalized canonical chain. Note that there may be lingering | |
// forks. | |
// keep those on the finalized canonical chain. Checking the state root ensures we avoid lingering forks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with 4fa32ccc8
// Abandoned forks will never be used, so we can safely delete the diffs | ||
prune_hot_diff: true, | ||
}] | ||
// Abandoned forks will never be used, so we can safely delete the diffs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting this now includes:
- Abandoned forks (not descended from finalized)
- Old stuff that wasn't pruned previously but is getting pruned now (older than previous finalization)
- Finalized states from the canonical chain which are not required by the HDiff mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comments
@@ -723,9 +629,7 @@ impl<E: EthSpec, Hot: ItemStore<E>, Cold: ItemStore<E>> BackgroundMigrator<E, Ho | |||
let persisted_head = PersistedBeaconChain { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove this write, and probably delete the PersistedBeaconChain
altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with 9ce4b33, but to fully remove PersistedBeaconChain
we need to have a way to compute the genesis state root and genesis block root when booting from an existing DB. The issue is that while the state is in the DB it is keyed by root which we don't know.
|
||
pub fn hot_hdiff_start_slot(&self) -> Slot { | ||
// TODO(hdiff): read the start slot from somewhere | ||
todo!(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking maybe we could recycle the anchor_slot
for use here, as it is fairly useless and not used for much. The only load-bearing use of the anchor_slot
that I can find is in try_prune_execution_payloads
where it is used to halt the pruning process. However, this is not necessary as we could either:
- Keep iterating back to Bellatrix, or
- Stop iterating back once we find a payload that is missing, or
- Use the
oldest_block_slot
to determine when to stop iterating back. This is my preferred option, as it is also compatible with storing execution payloads older than theanchor_slot
, i.e. Store execution payloads during backfill if--prune-payloads false
sigp/lighthouse#6510
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conclusion: we can recycle the anchor_slot
field and set it to the epoch-aligned slot of the snapshot state (not the slot of the checkpoint block, which might be older due to skipped slots).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed on 5ecf9db
06f6177
to
f4d44e4
Compare
Branch with commit history https://github.com/dapplion/lighthouse/pull/new/tree-states-hot-backup squashed and rebase on top of sigp/unstable |
1860064
to
fbc62f2
Compare
007727e
to
ff69ee0
Compare
e033b3b
to
ee13a1f
Compare
I have done manual testing and this branch works on mainnet:
All of them were broken originally, there's a lot of commits https://github.com/dapplion/lighthouse/pull/new/tree-states-hot-backup to make them work. I have also rebased to unstable, it was quite easy. For 3. I had to add a new column to store the new hot state summaries. Otherwise when diffing it will try to read a V22 summary and break. |
@michaelsproul If we stop updating the head_tracker in the persisted head, how can we downgrade safely? If you run an old version of Lighthouse it will read the persisted head which contains an empty head tracker. |
Partial implementation of https://hackmd.io/6WI3idBfR2q2AQyMUfYMyg