-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v24.3.x] Fixed checking if batches are already present in follower log #25338
Merged
piyushredpanda
merged 7 commits into
redpanda-data:v24.3.x
from
mmaslankaprv:manual-backport-25018-v24.3.x-214
Mar 12, 2025
Merged
[v24.3.x] Fixed checking if batches are already present in follower log #25338
piyushredpanda
merged 7 commits into
redpanda-data:v24.3.x
from
mmaslankaprv:manual-backport-25018-v24.3.x-214
Mar 12, 2025
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The current logic to wait on `enqueued` stage seems incorrect and is causing the test to flake. `eneuqued` only guarantees that the request is enqueued in the raft layer and subsequent replication may still fail with not_leader error. So the sequence of flakiness is something like this. 1. wait for enqueued futures to resolve 2. force a step down 3. wait for replicated entry to be truncated. If (2) happens before the entry is appended to the leader log, the replication future fails with a not leader resulting in a test retry. Effectively (1) is not a tight enough condition to wait on to force a step down. This commit rewrites the test to replicate with a leader_ack and wait for the future to be resolved (which guarantees a lleader append) and then the replication monitor waiter should be resolved with a truncation error after leadership change. Additionally the test seems to be missing some cleanup between retries, which is fixed Signed-off-by: Michał Maślanka <michal@redpanda.com>
When follower receives an append entries request that `prev_log_index` is smaller than its own `prev_log_index` it validates if the batches from the request matches (by checking a batch offset and corresponding term) its own log. It that is the case the batches are skipped to prevent truncation of valid batches and avoid the data loss. The validation of already matching batches was broken if they happened to be at the beginning of the log. In this case the `prev_log_index` is not initialized being negative. This case was not correctly handled by the logic calculating the next offset when checking matching batches. That lead to a situation in which a range of batches starting with 0 was never matching. Fixed the issue by correctly adjusting the `prev_log_index` if it is uninitialized. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 0d029cf)
When follower receives an append entries request with the vector of records that are all present in its own log and their offsets and terms match it should reply with success and correct `last_dirty_log_index`. This way a leader instead of moving the follower `next_offset` backward can start recovery process and deliver batches which the follower is missing. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit edd55dc)
Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit bd47cd9)
Assertion triggered in function body are not propagated to the test itself. Change the method to throw an exception in case of timeout instead of using an assertion. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit e2ec035)
The reply interceptor allows test creator to modify or drop the reply that is about to be processed by the RPC requester. This allow tests to take more control over the Raft protocol behavior and test some rare edge cases which might be hard to trigger otherwise. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 9627077)
Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 59e43a7)
CI test resultstest results on build#62993
|
bharathv
approved these changes
Mar 12, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of PR #25018
Includes #25035