Fixed checking if `append_entries_request` batches are already present in follower log #25018

mmaslankaprv · 2025-02-04T12:23:14Z

Background

When follower receives an append entries request that prev_log_index is smaller than its own prev_log_index it validates if the batches from the request matches (by checking a batch offset and corresponding term) its own log. It that is the case the batches are skipped to prevent truncation of valid batches and avoid the data loss.

Negative `append_entries_request::prev_log_index`

The validation of already matching batches was broken if they happened to be at the beginning of the log. In this case the prev_log_index is not initialised. This case was not correctly handled by the logic calculating the next offset when checking matching batches.

Replying with success when all request batches match

When follower receives an append entries request with the vector of records that are all present in its own log and their offsets and terms match it should reply with success and correct last_dirty_log_index.
This way a leader instead of moving the follower next_offset backward can start recovery process and deliver batches which the follower is missing.

Backports Required

Release Notes

Bug Fixes

fixes a very rare situation in which Raft leader can enter into infinite loop trying to recover follower.

src/v/raft/consensus.cc

vbotbuildovich · 2025-02-04T19:54:40Z

CI test results

test results on build#61563

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61563#0194d20c-ff99-4b94-b2f7-a64d44ed7679	FLAKY	1/3
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC	ducktape	https://buildkite.com/redpanda/redpanda/builds/61563#0194d20c-ff98-4ec0-9f56-38befe604032	FLAKY	1/2

test results on build#61682

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61682#0194dcb6-b3e7-4275-b585-63769e3a91eb	FLAKY	1/3
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade	ducktape	https://buildkite.com/redpanda/redpanda/builds/61682#0194dc9a-225b-4365-b41c-e42b927c3e92	FLAKY	1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP	ducktape	https://buildkite.com/redpanda/redpanda/builds/61682#0194dcb6-b3e7-4275-b585-63769e3a91eb	FLAKY	1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC	ducktape	https://buildkite.com/redpanda/redpanda/builds/61682#0194dcb6-b3e4-449e-a254-c66f8797a6ea	FLAKY	1/2
rptest.tests.datalake.custom_partitioning_test.DatalakeCustomPartitioningTest.test_basic.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP	ducktape	https://buildkite.com/redpanda/redpanda/builds/61682#0194dcb6-b3e5-4007-9090-5e5e97766310	FLAKY	1/2
rptest.tests.partition_movement_test.PartitionMovementTest.test_availability_when_one_node_down	ducktape	https://buildkite.com/redpanda/redpanda/builds/61682#0194dc9a-225a-472c-827d-daaa26f07098	FLAKY	1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic	ducktape	https://buildkite.com/redpanda/redpanda/builds/61682#0194dcb6-b3e6-4ddf-b17d-2a71ef0b0f40	FLAKY	1/2
rptest.tests.write_caching_fi_test.WriteCachingFailureInjectionTest.test_crash_all	ducktape	https://buildkite.com/redpanda/redpanda/builds/61682#0194dc9a-225b-4ae0-9014-9e69b7cda65e	FLAKY	1/2

test results on build#61910

test_id	test_kind	job_url	test_status	passed
kafka_server_rpfixture.kafka_server_rpfixture	unit	https://buildkite.com/redpanda/redpanda/builds/61910#019513ab-cd50-428f-9d81-5f8116eaf3f3	FLAKY	1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61910#019513f3-f79a-4ef5-b87b-40ddaa1e0373	FLAKY	1/2
rptest.tests.datalake.datalake_e2e_test.DatalakeE2ETests.test_topic_lifecycle.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP	ducktape	https://buildkite.com/redpanda/redpanda/builds/61910#01951406-cc75-4968-824a-19b89e830bdd	FLAKY	1/2
rptest.tests.datalake.mount_unmount_test.MountUnmountIcebergTest.test_simple_remount.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/61910#01951406-cc74-4052-9b53-57304652913d	FLAKY	1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=10	ducktape	https://buildkite.com/redpanda/redpanda/builds/61910#019513f3-f798-4e04-a267-6ece0682e028	FLAKY	1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=3	ducktape	https://buildkite.com/redpanda/redpanda/builds/61910#019513f3-f79a-4ef5-b87b-40ddaa1e0373	FLAKY	1/2
rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=0.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/61910#019513f3-f79a-4ef5-b87b-40ddaa1e0373	FLAKY	1/2

bashtanov · 2025-02-05T09:08:00Z

src/v/raft/tests/raft_fixture.h

 private:
    model::node_id _id;
    model::revision_id _revision;
    prefix_logger _logger;
    ss::sstring _base_directory;
    config::mock_property<size_t> _max_inflight_requests{16};
    config::mock_property<size_t> _max_queued_bytes{1_MiB};
+    config::mock_property<size_t> _default_recovery_read_size{32_KiB};


any reason we change it for existing tests?

no particular reason, i will make sure it is the same as before

src/v/raft/tests/basic_raft_fixture_test.cc

bashtanov · 2025-02-05T12:04:19Z

Assertion triggered in function body are not propagated to the test itself

Why is that? Anything wrong with the macro? AFAIK it's meant to work with both gtest and boost.

bharathv

lgtm modulo one question, took me a bit to digest the change, had to dig up Alexey's change that added these checks. Would be nice to get a blessing from @ztlpn too.

src/v/raft/consensus.cc

src/v/raft/tests/raft_fixture.cc

src/v/raft/tests/basic_raft_fixture_test.cc

bashtanov

A few questions as I'm not sure I understand the test.

bashtanov · 2025-02-07T13:50:01Z

src/v/raft/tests/raft_fixture.cc

+    std::ranges::copy(
+      _nodes | std::views::keys
+        | std::views::filter(
+          [leader_id](model::node_id id) { return id != leader_id; }),
+      std::back_inserter(followers));


nit: use copy_if?

bashtanov · 2025-02-07T14:05:32Z

src/v/raft/tests/basic_raft_fixture_test.cc

+    /**
+     * Recover communication and wait for the intercept to trigger
+     */
+    new_leader_node.reset_dispatch_handlers();


This will enable the new leader to send vote requests to the old leader. I guess it won't anyway, as it has been elected already. Do we need this?

src/v/raft/tests/raft_fixture.h

bashtanov · 2025-02-07T14:38:30Z

src/v/raft/tests/basic_raft_fixture_test.cc

+     * Recover communication and wait for the intercept to trigger
+     */
+    new_leader_node.reset_dispatch_handlers();
+    co_await reply_intercepted.wait([&] { return intercept_count > 5; });


We don't produce anything after the second election. What are the 20+ messages that are replicated from the new leader to the old one?

there will be no new messages replicated, only recovery append entry requests finally leading to truncation

When follower receives an append entries request that `prev_log_index` is smaller than its own `prev_log_index` it validates if the batches from the request matches (by checking a batch offset and corresponding term) its own log. It that is the case the batches are skipped to prevent truncation of valid batches and avoid the data loss. The validation of already matching batches was broken if they happened to be at the beginning of the log. In this case the `prev_log_index` is not initialized being negative. This case was not correctly handled by the logic calculating the next offset when checking matching batches. That lead to a situation in which a range of batches starting with 0 was never matching. Fixed the issue by correctly adjusting the `prev_log_index` if it is uninitialized. Signed-off-by: Michał Maślanka <michal@redpanda.com>

When follower receives an append entries request with the vector of records that are all present in its own log and their offsets and terms match it should reply with success and correct `last_dirty_log_index`. This way a leader instead of moving the follower `next_offset` backward can start recovery process and deliver batches which the follower is missing. Signed-off-by: Michał Maślanka <michal@redpanda.com>

Signed-off-by: Michał Maślanka <michal@redpanda.com>

Assertion triggered in function body are not propagated to the test itself. Change the method to throw an exception in case of timeout instead of using an assertion. Signed-off-by: Michał Maślanka <michal@redpanda.com>

The reply interceptor allows test creator to modify or drop the reply that is about to be processed by the RPC requester. This allow tests to take more control over the Raft protocol behavior and test some rare edge cases which might be hard to trigger otherwise. Signed-off-by: Michał Maślanka <michal@redpanda.com>

Signed-off-by: Michał Maślanka <michal@redpanda.com>

vbotbuildovich · 2025-02-18T15:58:19Z

/backport v24.3.x

vbotbuildovich · 2025-02-18T15:58:20Z

/backport v24.2.x

vbotbuildovich · 2025-02-18T15:58:21Z

/backport v24.1.x

vbotbuildovich · 2025-02-18T15:59:30Z

Failed to create a backport PR to v24.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-25018-v24.3.x-754 remotes/upstream/v24.3.x
git cherry-pick -x 0d029cfd84 edd55dcf23 bd47cd93a2 e2ec0352df 9627077942 59e43a7c31

Workflow run logs.

vbotbuildovich · 2025-02-18T15:59:42Z

Failed to create a backport PR to v24.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-25018-v24.2.x-107 remotes/upstream/v24.2.x
git cherry-pick -x 0d029cfd84 edd55dcf23 bd47cd93a2 e2ec0352df 9627077942 59e43a7c31

Workflow run logs.

vbotbuildovich · 2025-02-18T15:59:42Z

Failed to create a backport PR to v24.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-25018-v24.1.x-534 remotes/upstream/v24.1.x
git cherry-pick -x 0d029cfd84 edd55dcf23 bd47cd93a2 e2ec0352df 9627077942 59e43a7c31

Workflow run logs.

mmaslankaprv requested a review from ztlpn February 4, 2025 12:23

github-actions bot added the area/redpanda label Feb 4, 2025

mmaslankaprv requested review from bharathv and bashtanov February 4, 2025 12:23

mmaslankaprv force-pushed the fix-matching-entries-check branch from d7a60fa to 153a9c2 Compare February 4, 2025 12:46

bashtanov reviewed Feb 4, 2025

View reviewed changes

src/v/raft/consensus.cc Show resolved Hide resolved

mmaslankaprv marked this pull request as ready for review February 4, 2025 16:19

bashtanov reviewed Feb 5, 2025

View reviewed changes

src/v/raft/tests/basic_raft_fixture_test.cc Outdated Show resolved Hide resolved

bashtanov reviewed Feb 5, 2025

View reviewed changes

src/v/raft/tests/basic_raft_fixture_test.cc Show resolved Hide resolved

bharathv reviewed Feb 5, 2025

View reviewed changes

src/v/raft/consensus.cc Show resolved Hide resolved

src/v/raft/tests/raft_fixture.cc Outdated Show resolved Hide resolved

mmaslankaprv force-pushed the fix-matching-entries-check branch 2 times, most recently from 12b3024 to 5c4c17b Compare February 6, 2025 12:57

bashtanov reviewed Feb 6, 2025

View reviewed changes

src/v/raft/tests/basic_raft_fixture_test.cc Outdated Show resolved Hide resolved

mmaslankaprv force-pushed the fix-matching-entries-check branch from 5c4c17b to ed45488 Compare February 6, 2025 17:35

mmaslankaprv requested review from bharathv and bashtanov February 7, 2025 13:32

bashtanov reviewed Feb 7, 2025

View reviewed changes

mmaslankaprv added 6 commits February 17, 2025 12:25

r/tests: made recovery read size configurable in tests

bd47cd9

Signed-off-by: Michał Maślanka <michal@redpanda.com>

r/tests: fixed waiting for offsets in tests

e2ec035

Assertion triggered in function body are not propagated to the test itself. Change the method to throw an exception in case of timeout instead of using an assertion. Signed-off-by: Michał Maślanka <michal@redpanda.com>

r/tests: added test validating processing all matching batches

59e43a7

Signed-off-by: Michał Maślanka <michal@redpanda.com>

mmaslankaprv force-pushed the fix-matching-entries-check branch from ed45488 to 59e43a7 Compare February 17, 2025 11:25

mmaslankaprv requested a review from bashtanov February 17, 2025 16:07

ztlpn approved these changes Feb 18, 2025

View reviewed changes

mmaslankaprv merged commit 86741ac into redpanda-data:dev Feb 18, 2025
17 checks passed

vbotbuildovich mentioned this pull request Feb 18, 2025

[v24.3.x] Fixed checking if append_entries_request batches are already present in follower log #25102

Closed

This was referenced Feb 18, 2025

[v24.2.x] Fixed checking if append_entries_request batches are already present in follower log #25103

Closed

[v24.1.x] Fixed checking if append_entries_request batches are already present in follower log #25104

Open

This was referenced Mar 12, 2025

[v24.3.x] Fixed checking if batches are already present in follower log #25338

Merged

[v24.2.x] Fixed checking if batches are already present in follower log #25377

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed checking if `append_entries_request` batches are already present in follower log #25018

Fixed checking if `append_entries_request` batches are already present in follower log #25018

mmaslankaprv commented Feb 4, 2025 •

edited

Loading

vbotbuildovich commented Feb 4, 2025 •

edited

Loading

bashtanov Feb 5, 2025

mmaslankaprv Feb 5, 2025

bashtanov commented Feb 5, 2025

bharathv left a comment

bashtanov left a comment

bashtanov Feb 7, 2025

bashtanov Feb 7, 2025

bashtanov Feb 7, 2025

mmaslankaprv Feb 17, 2025

vbotbuildovich commented Feb 18, 2025

vbotbuildovich commented Feb 18, 2025

vbotbuildovich commented Feb 18, 2025

vbotbuildovich commented Feb 18, 2025

vbotbuildovich commented Feb 18, 2025

vbotbuildovich commented Feb 18, 2025

Fixed checking if append_entries_request batches are already present in follower log #25018

Fixed checking if append_entries_request batches are already present in follower log #25018

Conversation

mmaslankaprv commented Feb 4, 2025 • edited Loading

Background

Negative append_entries_request::prev_log_index

Replying with success when all request batches match

Backports Required

Release Notes

Bug Fixes

vbotbuildovich commented Feb 4, 2025 • edited Loading

CI test results

bashtanov Feb 5, 2025

Choose a reason for hiding this comment

mmaslankaprv Feb 5, 2025

Choose a reason for hiding this comment

bashtanov commented Feb 5, 2025

bharathv left a comment

Choose a reason for hiding this comment

bashtanov left a comment

Choose a reason for hiding this comment

bashtanov Feb 7, 2025

Choose a reason for hiding this comment

bashtanov Feb 7, 2025

Choose a reason for hiding this comment

bashtanov Feb 7, 2025

Choose a reason for hiding this comment

mmaslankaprv Feb 17, 2025

Choose a reason for hiding this comment

vbotbuildovich commented Feb 18, 2025

vbotbuildovich commented Feb 18, 2025

vbotbuildovich commented Feb 18, 2025

vbotbuildovich commented Feb 18, 2025

vbotbuildovich commented Feb 18, 2025

vbotbuildovich commented Feb 18, 2025

Fixed checking if `append_entries_request` batches are already present in follower log #25018

Fixed checking if `append_entries_request` batches are already present in follower log #25018

mmaslankaprv commented Feb 4, 2025 •

edited

Loading

Negative `append_entries_request::prev_log_index`

vbotbuildovich commented Feb 4, 2025 •

edited

Loading