Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

disambiguate the matches then mismatches case for ancestor samples #31617

Merged

Conversation

AshwinSekar
Copy link
Contributor

Problem

If duplicate block is received immediately upon startup from snapshot, the ancestor sample will be discarded as it falls under mismatch then match case (earlier ancestors are missing from blockstore but snapshot bank matches).

Summary of Changes

Disambiguate mismatch then match case to allow this interaction.

Fixes #31583

@AshwinSekar AshwinSekar marked this pull request as ready for review May 12, 2023 20:30
@AshwinSekar AshwinSekar requested review from carllin and wen-coding May 12, 2023 20:30
return DuplicateAncestorDecision::InvalidSample;
}
(
Some((mismatch_i, DuplicateAncestorDecision::EarliestAncestorNotFrozen(_))),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checks if we found an unfrozen mismatch earlier. Could disambiguate it even further if needed lmk

  • Catch frozen hash matches, then random unfrozen block, then frozen hash matches again (shouldn't happen even in snapshot)
  • Pass earliest block available in blockstore and ignore mismatches before then (could get tricky if block is getting dump & repaired by state machine on startup)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch frozen hash matches, then random unfrozen block, then frozen hash matches again (shouldn't happen even in snapshot)

I think this fix is fine for now. we could probably detect this by keeping track of the number of frozen hash matches and checking if it's > 0 when we detect a frozen hash after an unfrozne block

frozen in our blockstore. However for a later ancestor {ancestor_slot} we have
agreement on {our_frozen_hash} as the bank hash. This should only be possible if
we have just started from snapshot and immediately encountered a duplicate block,
otherwise something is seriously wrong. Continuing with the repair",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Continuing with the repair" rn will still repair everything from the first missing block. However we could go even further and detect the snapshot root and only repair things after. Probably no point in repairing before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now

@codecov
Copy link

codecov bot commented May 12, 2023

Codecov Report

Merging #31617 (9f42eb6) into master (8e5e66f) will increase coverage by 0.0%.
The diff coverage is 93.3%.

@@           Coverage Diff           @@
##           master   #31617   +/-   ##
=======================================
  Coverage    81.3%    81.3%           
=======================================
  Files         733      733           
  Lines      206693   206713   +20     
=======================================
+ Hits       168064   168096   +32     
+ Misses      38629    38617   -12     

@@ -669,6 +718,33 @@ pub mod tests {
);
}

#[test]
fn test_add_multiple_responses_start_from_snapshot_missing_then_matches() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the test case

frozen in our blockstore. However for a later ancestor {ancestor_slot} we have
agreement on {our_frozen_hash} as the bank hash. This should only be possible if
we have just started from snapshot and immediately encountered a duplicate block,
otherwise something is seriously wrong. Continuing with the repair",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now

return DuplicateAncestorDecision::InvalidSample;
}
(
Some((mismatch_i, DuplicateAncestorDecision::EarliestAncestorNotFrozen(_))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch frozen hash matches, then random unfrozen block, then frozen hash matches again (shouldn't happen even in snapshot)

I think this fix is fine for now. we could probably detect this by keeping track of the number of frozen hash matches and checking if it's > 0 when we detect a frozen hash after an unfrozne block

@AshwinSekar AshwinSekar merged commit c85b057 into solana-labs:master May 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specific ordering can cause ancestor_hashes_service to be stuck if the first block on startup is duplicate
2 participants