Skip to content

Conversation

@heifner
Copy link
Contributor

@heifner heifner commented Apr 19, 2024

Integration test with 4 finalizers (A, B, C, and D).

  • The 4 nodes are cleanly shutdown in the following state:
    • A has LIB N. A has a finalizer safety information file that locks on a block after N.
    • B, C, and D have LIB less than N. They have finalizer safety information files that lock on N.

All nodes but A lose their reversible blocks and restart from an earlier snapshot.

A is restarted and replays up to block N after restarting from snapshot. Block N is sent to the other
nodes B, C, and D after they are also started up again.

Verify that LIB advances and that A, B, C, and D are eventually voting strong on new blocks.

@heifner heifner linked an issue Apr 19, 2024 that may be closed by this pull request
@heifner heifner added the OCI Work exclusive to OCI team label Apr 19, 2024
@heifner heifner requested review from greg7mdp and linh2931 April 19, 2024 12:50
assert not node2.verifyAlive(), "Node2 did not shutdown"
assert not node3.verifyAlive(), "Node3 did not shutdown"

# node0 will have higher lib than 1,2,3 since it can incorporate QCs in blocks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yet we can't use waitForLibToAdvance() at line 87?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need to capture the LIB above and wait for LIB+1, but there is inherit race conditions on the get_info calls and where we are in the test. Waiting for head to advance should be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if "Node::kill" would return the lib right before the node is killed, so we could verify the assertion that node0 has higher lib than 1,2,3 when it is killed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need to report in a log statement at exit or use leap util to look at the block log.

Copy link
Contributor

@greg7mdp greg7mdp Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do in Node.py?
currentLib = self.getIrreversibleBlockNum()

Looks like it does a get_info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but LIB can change immediately after that call or right before that call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but still it is the best indication of what lib was right before the node is killed.

@heifner heifner merged commit dcda785 into savanna Apr 25, 2024
@heifner heifner deleted the GH-13-disaster-test branch April 25, 2024 11:20
@ericpassmore
Copy link
Contributor

ericpassmore commented Apr 30, 2024

Note:start
group: IF
category: TEST
summary: Disaster recovery test with four finalizers. Ensure block N on one node may be recovered after losing reservable blocks and starting from snapshot.
Note:end

greg7mdp added a commit that referenced this pull request Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCI Work exclusive to OCI team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disaster recovery integration test

5 participants