-
Notifications
You must be signed in to change notification settings - Fork 4.6k
disambiguate the matches then mismatches case for ancestor samples #31617
disambiguate the matches then mismatches case for ancestor samples #31617
Conversation
return DuplicateAncestorDecision::InvalidSample; | ||
} | ||
( | ||
Some((mismatch_i, DuplicateAncestorDecision::EarliestAncestorNotFrozen(_))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checks if we found an unfrozen mismatch earlier. Could disambiguate it even further if needed lmk
- Catch frozen hash matches, then random unfrozen block, then frozen hash matches again (shouldn't happen even in snapshot)
- Pass earliest block available in blockstore and ignore mismatches before then (could get tricky if block is getting dump & repaired by state machine on startup)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catch frozen hash matches, then random unfrozen block, then frozen hash matches again (shouldn't happen even in snapshot)
I think this fix is fine for now. we could probably detect this by keeping track of the number of frozen hash matches and checking if it's > 0
when we detect a frozen hash after an unfrozne block
frozen in our blockstore. However for a later ancestor {ancestor_slot} we have | ||
agreement on {our_frozen_hash} as the bank hash. This should only be possible if | ||
we have just started from snapshot and immediately encountered a duplicate block, | ||
otherwise something is seriously wrong. Continuing with the repair", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Continuing with the repair" rn will still repair everything from the first missing block. However we could go even further and detect the snapshot root and only repair things after. Probably no point in repairing before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now
Codecov Report
@@ Coverage Diff @@
## master #31617 +/- ##
=======================================
Coverage 81.3% 81.3%
=======================================
Files 733 733
Lines 206693 206713 +20
=======================================
+ Hits 168064 168096 +32
+ Misses 38629 38617 -12 |
@@ -669,6 +718,33 @@ pub mod tests { | |||
); | |||
} | |||
|
|||
#[test] | |||
fn test_add_multiple_responses_start_from_snapshot_missing_then_matches() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the test case
frozen in our blockstore. However for a later ancestor {ancestor_slot} we have | ||
agreement on {our_frozen_hash} as the bank hash. This should only be possible if | ||
we have just started from snapshot and immediately encountered a duplicate block, | ||
otherwise something is seriously wrong. Continuing with the repair", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now
return DuplicateAncestorDecision::InvalidSample; | ||
} | ||
( | ||
Some((mismatch_i, DuplicateAncestorDecision::EarliestAncestorNotFrozen(_))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catch frozen hash matches, then random unfrozen block, then frozen hash matches again (shouldn't happen even in snapshot)
I think this fix is fine for now. we could probably detect this by keeping track of the number of frozen hash matches and checking if it's > 0
when we detect a frozen hash after an unfrozne block
Problem
If duplicate block is received immediately upon startup from snapshot, the ancestor sample will be discarded as it falls under mismatch then match case (earlier ancestors are missing from blockstore but snapshot bank matches).
Summary of Changes
Disambiguate mismatch then match case to allow this interaction.
Fixes #31583