-
Notifications
You must be signed in to change notification settings - Fork 524
Fix partition recovery tests #2820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -135,6 +135,10 @@ func runTestWithStaggeredStopStart(t *testing.T, fixture *fixtures.RestClientFix | |
| // Stop Node1 | ||
| nc1.FullStop() | ||
|
|
||
| status, err := fixture.LibGoalClient.Status() | ||
| a.NoError(err) | ||
| roundAfterStop := status.LastRound | ||
|
|
||
| time.Sleep(inducePartitionTime) | ||
|
|
||
| // Use the fixture to start the node again so it supplies the correct peer addresses | ||
|
|
@@ -152,10 +156,10 @@ func runTestWithStaggeredStopStart(t *testing.T, fixture *fixtures.RestClientFix | |
| a.NoError(err) | ||
|
|
||
| // Now wait for us to make progress again. | ||
| status, err := fixture.LibGoalClient.Status() | ||
| status, err = fixture.LibGoalClient.Status() | ||
| a.NoError(err) | ||
|
|
||
| a.Equal(waitForRound, status.LastRound, "We should not have made progress since stopping the first node") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was the bug that sometimes we would get to It seems like there is still a race condition between resuming node 1 and shutting down node 2. Could we move
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe that race condition is the point of the test? Not sure exactly what is intended by the name
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The intend of the test is to make sure that after stopping If the test proceeds in a timely manner, the node will be before round 3 when However, if the test thread is slow, and by the time By reading the round number after
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the explanation. I think there is still a race condition, here is what the test does:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in theory yes, you are correct. However, this is unlikely to happen, because at this point the network is stalled, and it will take a long time for it to recover to make any progress. By the time that may happen, n2 is already stopped. However, in the previous situation, it was possible even in a fraction of a second to change the round.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll defer whether or not this point is blocking for the PR. Seems like we may as well stop n2 before starting n1, but maybe it isn't a problem. |
||
| a.Equal(roundAfterStop, status.LastRound, "We should not have made progress since stopping the first node") | ||
|
|
||
| err = fixture.WaitForRound(status.LastRound+1, partitionRecoveryTime) | ||
| a.NoError(err) | ||
|
|
@@ -193,6 +197,10 @@ func TestBasicPartitionRecoveryPartOffline(t *testing.T) { | |
| // Stop Node1 | ||
| nc1.FullStop() | ||
|
|
||
| status, err := fixture.LibGoalClient.Status() | ||
| a.NoError(err) | ||
| roundAfterStop := status.LastRound | ||
|
|
||
| // Stop the 2nd node and give network a chance to stall | ||
| nc2, err := fixture.GetNodeController("Node2") | ||
| a.NoError(err) | ||
|
|
@@ -205,10 +213,10 @@ func TestBasicPartitionRecoveryPartOffline(t *testing.T) { | |
| a.NoError(err) | ||
|
|
||
| // Now wait for us to make progress again. | ||
| status, err := fixture.LibGoalClient.Status() | ||
| status, err = fixture.LibGoalClient.Status() | ||
| a.NoError(err) | ||
|
|
||
| a.Equal(waitForRound, status.LastRound, "We should not have made progress since stopping the first node") | ||
| a.Equal(roundAfterStop, status.LastRound, "We should not have made progress since stopping the first node") | ||
winder marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| err = fixture.WaitForRound(status.LastRound+1, partitionRecoveryTime) | ||
| a.NoError(err) | ||
|
|
@@ -253,12 +261,16 @@ func TestPartitionHalfOffline(t *testing.T) { | |
| a.NoError(err) | ||
| nc3.FullStop() | ||
|
|
||
| status, err := client.Status() | ||
| a.NoError(err) | ||
| roundAfterStop := status.LastRound | ||
|
|
||
| time.Sleep(inducePartitionTime) | ||
|
|
||
| // Get main client to monitor | ||
| status, err := client.Status() | ||
| status, err = client.Status() | ||
| a.NoError(err) | ||
| a.Equal(waitForRound, status.LastRound, "We should not have made progress since stopping the nodes") | ||
| a.Equal(roundAfterStop, status.LastRound, "We should not have made progress since stopping the nodes") | ||
winder marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| // Start 40 of 50% of the stake | ||
| _, err = fixture.StartNode(nc1.GetDataDir()) | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.