Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed checking if append_entries_request batches are already present in follower log #25018

Merged

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Feb 4, 2025

Background

When follower receives an append entries request that prev_log_index is smaller than its own prev_log_index it validates if the batches from the request matches (by checking a batch offset and corresponding term) its own log. It that is the case the batches are skipped to prevent truncation of valid batches and avoid the data loss.

Negative append_entries_request::prev_log_index

The validation of already matching batches was broken if they happened to be at the beginning of the log. In this case the prev_log_index is not initialised. This case was not correctly handled by the logic calculating the next offset when checking matching batches.

Replying with success when all request batches match

When follower receives an append entries request with the vector of records that are all present in its own log and their offsets and terms match it should reply with success and correct last_dirty_log_index.
This way a leader instead of moving the follower next_offset backward can start recovery process and deliver batches which the follower is missing.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • fixes a very rare situation in which Raft leader can enter into infinite loop trying to recover follower.

@mmaslankaprv mmaslankaprv marked this pull request as ready for review February 4, 2025 16:19
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 4, 2025

CI test results

test results on build#61563
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61563#0194d20c-ff99-4b94-b2f7-a64d44ed7679 FLAKY 1/3
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/61563#0194d20c-ff98-4ec0-9f56-38befe604032 FLAKY 1/2
test results on build#61682
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61682#0194dcb6-b3e7-4275-b585-63769e3a91eb FLAKY 1/3
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61682#0194dc9a-225b-4365-b41c-e42b927c3e92 FLAKY 1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/61682#0194dcb6-b3e7-4275-b585-63769e3a91eb FLAKY 1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/61682#0194dcb6-b3e4-449e-a254-c66f8797a6ea FLAKY 1/2
rptest.tests.datalake.custom_partitioning_test.DatalakeCustomPartitioningTest.test_basic.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/61682#0194dcb6-b3e5-4007-9090-5e5e97766310 FLAKY 1/2
rptest.tests.partition_movement_test.PartitionMovementTest.test_availability_when_one_node_down ducktape https://buildkite.com/redpanda/redpanda/builds/61682#0194dc9a-225a-472c-827d-daaa26f07098 FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/61682#0194dcb6-b3e6-4ddf-b17d-2a71ef0b0f40 FLAKY 1/2
rptest.tests.write_caching_fi_test.WriteCachingFailureInjectionTest.test_crash_all ducktape https://buildkite.com/redpanda/redpanda/builds/61682#0194dc9a-225b-4ae0-9014-9e69b7cda65e FLAKY 1/2
test results on build#61910
test_id test_kind job_url test_status passed
kafka_server_rpfixture.kafka_server_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/61910#019513ab-cd50-428f-9d81-5f8116eaf3f3 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61910#019513f3-f79a-4ef5-b87b-40ddaa1e0373 FLAKY 1/2
rptest.tests.datalake.datalake_e2e_test.DatalakeE2ETests.test_topic_lifecycle.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/61910#01951406-cc75-4968-824a-19b89e830bdd FLAKY 1/2
rptest.tests.datalake.mount_unmount_test.MountUnmountIcebergTest.test_simple_remount.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61910#01951406-cc74-4052-9b53-57304652913d FLAKY 1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=10 ducktape https://buildkite.com/redpanda/redpanda/builds/61910#019513f3-f798-4e04-a267-6ece0682e028 FLAKY 1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=3 ducktape https://buildkite.com/redpanda/redpanda/builds/61910#019513f3-f79a-4ef5-b87b-40ddaa1e0373 FLAKY 1/2
rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=0.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/61910#019513f3-f79a-4ef5-b87b-40ddaa1e0373 FLAKY 1/2

private:
model::node_id _id;
model::revision_id _revision;
prefix_logger _logger;
ss::sstring _base_directory;
config::mock_property<size_t> _max_inflight_requests{16};
config::mock_property<size_t> _max_queued_bytes{1_MiB};
config::mock_property<size_t> _default_recovery_read_size{32_KiB};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we change it for existing tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no particular reason, i will make sure it is the same as before

@bashtanov
Copy link
Contributor

Assertion triggered in function body are not propagated to the test itself

Why is that? Anything wrong with the macro? AFAIK it's meant to work with both gtest and boost.

Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo one question, took me a bit to digest the change, had to dig up Alexey's change that added these checks. Would be nice to get a blessing from @ztlpn too.

@mmaslankaprv mmaslankaprv force-pushed the fix-matching-entries-check branch 2 times, most recently from 12b3024 to 5c4c17b Compare February 6, 2025 12:57
@mmaslankaprv mmaslankaprv force-pushed the fix-matching-entries-check branch from 5c4c17b to ed45488 Compare February 6, 2025 17:35
Copy link
Contributor

@bashtanov bashtanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions as I'm not sure I understand the test.

Comment on lines 707 to 711
std::ranges::copy(
_nodes | std::views::keys
| std::views::filter(
[leader_id](model::node_id id) { return id != leader_id; }),
std::back_inserter(followers));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use copy_if?

/**
* Recover communication and wait for the intercept to trigger
*/
new_leader_node.reset_dispatch_handlers();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will enable the new leader to send vote requests to the old leader. I guess it won't anyway, as it has been elected already. Do we need this?

* Recover communication and wait for the intercept to trigger
*/
new_leader_node.reset_dispatch_handlers();
co_await reply_intercepted.wait([&] { return intercept_count > 5; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't produce anything after the second election. What are the 20+ messages that are replicated from the new leader to the old one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will be no new messages replicated, only recovery append entry requests finally leading to truncation

When follower receives an append entries request that `prev_log_index` is
smaller than its own `prev_log_index` it validates if the
batches from the request matches (by checking a batch offset and
corresponding term) its own log. It that is the case the batches are
skipped to prevent truncation of valid batches and avoid the data loss.

The validation of already matching batches was broken if they happened
to be at the beginning of the log. In this case the `prev_log_index` is
not initialized being negative. This case was not correctly handled by
the logic calculating the next offset when checking matching batches.

That lead to a situation in which a range of batches starting with 0 was
never matching.

Fixed the issue by correctly adjusting the `prev_log_index` if it is
uninitialized.

Signed-off-by: Michał Maślanka <michal@redpanda.com>
When follower receives an append entries request with the vector of
records that are all present in its own log and their offsets and terms
match it should reply with success and correct `last_dirty_log_index`.
This way a leader instead of moving the follower `next_offset` backward
can start recovery process and deliver batches which the follower is
missing.

Signed-off-by: Michał Maślanka <michal@redpanda.com>
Signed-off-by: Michał Maślanka <michal@redpanda.com>
Assertion triggered in function body are not propagated to the test
itself. Change the method to throw an exception in case of timeout
instead of using an assertion.

Signed-off-by: Michał Maślanka <michal@redpanda.com>
The reply interceptor allows test creator to modify or drop the reply
that is about to be processed by the RPC requester. This allow tests to
take more control over the Raft protocol behavior and test some rare
edge cases which might be hard to trigger otherwise.

Signed-off-by: Michał Maślanka <michal@redpanda.com>
Signed-off-by: Michał Maślanka <michal@redpanda.com>
@mmaslankaprv mmaslankaprv force-pushed the fix-matching-entries-check branch from ed45488 to 59e43a7 Compare February 17, 2025 11:25
@mmaslankaprv mmaslankaprv merged commit 86741ac into redpanda-data:dev Feb 18, 2025
17 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-25018-v24.3.x-754 remotes/upstream/v24.3.x
git cherry-pick -x 0d029cfd84 edd55dcf23 bd47cd93a2 e2ec0352df 9627077942 59e43a7c31

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-25018-v24.2.x-107 remotes/upstream/v24.2.x
git cherry-pick -x 0d029cfd84 edd55dcf23 bd47cd93a2 e2ec0352df 9627077942 59e43a7c31

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-25018-v24.1.x-534 remotes/upstream/v24.1.x
git cherry-pick -x 0d029cfd84 edd55dcf23 bd47cd93a2 e2ec0352df 9627077942 59e43a7c31

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants