Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Add system test to measure recovery after partition #20902

Merged
merged 5 commits into from
Nov 8, 2021

Conversation

AshwinSekar
Copy link
Contributor

@AshwinSekar AshwinSekar commented Oct 22, 2021

Problem

We wish to test how long the cluster takes to recover after a partition.

Summary of Changes

Sets up a buildkite test that partitions the network for a specified time. Then by measuring validator confirmation times, we can measure the time needed for the cluster to recover.

Here's an example run of 5 nodes with 2 partitions that warmed up for 120 seconds, partitioned for 300 seconds and then took 42 seconds to recover after resolving the partition.

Pre partition validator confirmation time: 1167 ms
Validator confirmation is 123229 ms immediately after the partition
42 seconds after resolving the partition, validator confirmation time fell to 779

Fixes #

@AshwinSekar AshwinSekar marked this pull request as ready for review October 25, 2021 23:45
carllin
carllin previously approved these changes Oct 25, 2021
@mergify mergify bot dismissed carllin’s stale review October 26, 2021 19:06

Pull request has been modified.

@AshwinSekar AshwinSekar force-pushed the partition-tests branch 9 times, most recently from d4ce8d2 to 06113f5 Compare November 4, 2021 21:53
@AshwinSekar AshwinSekar merged commit c56fb0f into solana-labs:master Nov 8, 2021
dankelleher pushed a commit to identity-com/solana that referenced this pull request Nov 24, 2021
* Add system test to measure recovery after partition

* shellcheck

* increase partition length until failure

* adjust parameters and output

* different stopping condition
frits-metalogix added a commit to identity-com/solana that referenced this pull request Nov 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants