Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Conversation

@carllin
Copy link
Contributor

@carllin carllin commented Oct 21, 2020

Problem

test_optimistic_confirmation_violation_without_tower is flaky with failure message: "Violation expected because of removed persisted tower!". Example of failure case:

[2020-10-20T22:12:25.389384948Z INFO  local_cluster] collected validator C's votes: {29, 30, 31, 32}
[2020-10-20T22:12:36.152101878Z INFO  local_cluster] Observed A's votes on: [
    26, 26, 26, 26, 26, 26, 26, 26, 26, 33, 33, 33, 33, 33, 33, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 44, 44, 44, 44, 44, 45, 45, 45, 45, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 52, 52, 52, 52, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53]

A's vote on 33 is a descendant of C's block 32, but the test doesn't recognize this.

Summary of Changes

Check blockstore for descendants check

Context #10718 #12350

@carllin carllin requested review from ryoqun and t-nelson October 21, 2020 02:05
@carllin carllin changed the title Fix local cluster test Fix test_optimistic_confirmation_violation_without_tower() Oct 21, 2020
t-nelson
t-nelson previously approved these changes Oct 21, 2020
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 1683 to 1685
// C can avoid NoPropagatedConfirmation erorrs and continue to generate blocks
// 2) Provide gosisp discovery for `A` when it restarts because `A` will restart
// at a different gossip port than the entrypoint saved in C's gosisp table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// C can avoid NoPropagatedConfirmation erorrs and continue to generate blocks
// 2) Provide gosisp discovery for `A` when it restarts because `A` will restart
// at a different gossip port than the entrypoint saved in C's gosisp table
// C can avoid NoPropagatedConfirmation errors and continue to generate blocks
// 2) Provide gossip discovery for `A` when it restarts because `A` will restart
// at a different gossip port than the entrypoint saved in C's gossip table

You still blaming the keyboard? 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's just how I thought those were words were spelled, totally on purpose

@mergify mergify bot dismissed t-nelson’s stale review October 21, 2020 02:48

Pull request has been modified.

@codecov
Copy link

codecov bot commented Oct 21, 2020

Codecov Report

Merging #13043 into master will increase coverage by 0.0%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #13043   +/-   ##
=======================================
  Coverage    82.1%    82.1%           
=======================================
  Files         366      366           
  Lines       86101    86101           
=======================================
+ Hits        70742    70749    +7     
+ Misses      15359    15352    -7     

// 1) Propagate A's votes for S2 to validator C after A shuts down so that
// C can avoid NoPropagatedConfirmation errors and continue to generate blocks
// 2) Provide gossip discovery for `A` when it restarts because `A` will restart
// at a different gossip port than the entrypoint saved in C's gossip table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

for a in ancestors {
if votes_on_c_fork.contains(&a) {
bad_vote_detected = true;
break;
Copy link
Contributor

@ryoqun ryoqun Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, this only breaks from the newly-added inner for loop, maybe break 'outer or use if ancestors.any(|a| votes_on_c_fork.contains(a)) { bad_vote_detected = true; break}.

I'd prefer the latter. Nested loops are hard to manage as this bug illustrates. ;)

I think the outer loop will wait for the timeout needlessly currently with the new code.

@ryoqun
Copy link
Contributor

ryoqun commented Oct 21, 2020

@carllin Hooray! Thanks for debugging the propagation error and actually helping to fix the flaky test_optimistic_confirmation_violation_without_tower. I owe you.

@carllin carllin force-pushed the Debug2 branch 2 times, most recently from e1988d0 to 5e6bfc6 Compare October 21, 2020 20:56
@carllin carllin merged commit dd6ccca into solana-labs:master Oct 21, 2020
@ryoqun ryoqun added the v1.4 label Oct 26, 2020
mergify bot pushed a commit that referenced this pull request Oct 26, 2020
Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit dd6ccca)
@ryoqun
Copy link
Contributor

ryoqun commented Oct 26, 2020

@ryoqun ryoqun added the v1.4 label 14 seconds ago

Adding this, because this fluky test is currently affected on v1.4 branch as well

mergify bot added a commit that referenced this pull request Oct 26, 2020
…13145)

Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit dd6ccca)

Co-authored-by: carllin <wumu727@gmail.com>
CriesofCarrots pushed a commit to CriesofCarrots/solana that referenced this pull request Dec 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants