Skip to content

Comments

Fix flake replicas different ranks test#3164

Merged
enjoy-binbin merged 7 commits intovalkey-io:unstablefrom
hanxizh9910:unstable
Feb 5, 2026
Merged

Fix flake replicas different ranks test#3164
enjoy-binbin merged 7 commits intovalkey-io:unstablefrom
hanxizh9910:unstable

Conversation

@hanxizh9910
Copy link
Contributor

@hanxizh9910 hanxizh9910 commented Feb 5, 2026

Closes #2683

For the test, I found two problems.

1. Test Assumes Winner Has Rank #0

In my test run, in the failing case:

  • Replica -3 (rank 0) won epoch 10 first
  • Replica -6 (rank 1) won epoch 11 second
  • Replica -3 saw higher epoch and stepped down
  • Final result: rank 1 became master

Rank #0 doesn't guarantee being the final master.

2. Pattern Matching Bug

The pattern *Start of election*rank #0* incorrectly matches "primary rank #0":

Log: Start of election delayed for 350 milliseconds (rank #1, primary rank #0, offset 2172)

This line has rank #1, but the pattern matches because of "primary rank #0" at the end.

This is the test example. I added some print statements in the test

Solution

  • Fixed the format problem by checking if (rank #0 or (rank #1 so that we won't accidentally match the primary rank
  • Only check to make sure replica 3 and replica 6 have different ranks without assuming the replica with rank 0 will become the master.

Test

I ran it for 1000 times: https://github.com/hanxizh9910/valkey/actions/runs/21705632232

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
@enjoy-binbin enjoy-binbin changed the title [Deflake] Make sure the replicas always get the different ranks Fix flake replicas different ranks test Feb 5, 2026
@enjoy-binbin
Copy link
Member

I'm thinking we might be able to avoid this problem within the cluster code, R6 shouldn't actually be able to initiate an election with a new epoch. The one of the purpose of the test is to ensure that the replica with rank 0 can win. Let me think about it some more.

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.97%. Comparing base (9f374e1) to head (b85a348).
⚠️ Report is 3 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3164      +/-   ##
============================================
+ Coverage     74.87%   74.97%   +0.09%     
============================================
  Files           129      129              
  Lines         71309    71318       +9     
============================================
+ Hits          53393    53468      +75     
+ Misses        17916    17850      -66     

see 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@enjoy-binbin
Copy link
Member

emm, look like we can't (or unable to find out in a short time). I am merging it, it is a good fix, thank you.

@enjoy-binbin enjoy-binbin merged commit 839d0c6 into valkey-io:unstable Feb 5, 2026
25 checks passed
@sarthakaggarwal97
Copy link
Contributor

Thanks @hanxizh9910!

sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Feb 6, 2026
Address valkey-io#2683 

For the test, I found two problems.

### 1. Test Assumes Winner Has Rank #0
In my test run, in the failing case:
- Replica -3 (rank 0) won epoch 10 first
- Replica -6 (rank 1) won epoch 11 second
- Replica -3 saw higher epoch and stepped down
- Final result: rank 1 became master

Rank #0 doesn't guarantee being the final master.

### 2. Pattern Matching Bug
The pattern `*Start of election*rank #0*` incorrectly matches "primary
rank #0":

Log: `Start of election delayed for 350 milliseconds (rank #1, primary
rank #0, offset 2172)`

This line has rank `#1`, but the pattern matches because of "primary
rank `#0`" at the end.

## Solution
- Fixed the format problem by checking if `(rank #0` or `(rank #1` so
that we won't accidentally match the primary rank
- Only check to make sure replica 3 and replica 6 have different ranks
without assuming the replica with rank 0 will become the master.

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
harrylin98 pushed a commit to harrylin98/valkey_forked that referenced this pull request Feb 19, 2026
Address valkey-io#2683 

For the test, I found two problems.

### 1. Test Assumes Winner Has Rank #0
In my test run, in the failing case:
- Replica -3 (rank 0) won epoch 10 first
- Replica -6 (rank 1) won epoch 11 second
- Replica -3 saw higher epoch and stepped down
- Final result: rank 1 became master

Rank #0 doesn't guarantee being the final master.

### 2. Pattern Matching Bug
The pattern `*Start of election*rank #0*` incorrectly matches "primary
rank #0":

Log: `Start of election delayed for 350 milliseconds (rank #1, primary
rank #0, offset 2172)`

This line has rank `#1`, but the pattern matches because of "primary
rank `#0`" at the end.

## Solution
- Fixed the format problem by checking if `(rank #0` or `(rank #1` so
that we won't accidentally match the primary rank
- Only check to make sure replica 3 and replica 6 have different ranks
without assuming the replica with rank 0 will become the master.

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TEST-FAILURE] Make sure the replicas always get the different ranks

3 participants