Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft/tests: deflake truncation_detection_test #25035

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented Feb 5, 2025

The current logic to wait on enqueued stage seems incorrect and is causing the test to flake. eneuqued only guarantees that the request is enqueued in the raft layer and subsequent replication may still fail with not_leader error. So the sequence of flakiness is something like this.

  1. wait for enqueued futures to resolve
  2. force a step down
  3. wait for replicated entry to be truncated.

If (2) happens before the entry is appended to the leader log, the replication future fails with a not leader resulting in a test retry. Effectively (1) is not a tight enough condition to wait on to force a step down.

This commit rewrites the test to replicate with a leader_ack and wait for the future to be resolved (which guarantees a lleader append) and then the replication monitor waiter should be resolved with a truncation error after leadership change.

Additionally the test seems to be missing some cleanup between retries, which is fixed

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

The current logic to wait on `enqueued` stage seems incorrect and is
causing the test to flake. `eneuqued` only guarantees that the request
is enqueued in the raft layer and subsequent replication may still fail
with not_leader error. So the sequence of flakiness is something like
this.

1. wait for enqueued futures to resolve
2. force a step down
3. wait for replicated entry to be truncated.

If (2) happens before the entry is appended to the leader log, the
replication future fails with a not leader resulting in a test retry.
Effectively (1) is not a tight enough condition to wait on to force
a step down.

This commit rewrites the test to replicate with a leader_ack and wait
for the future to be resolved (which guarantees a lleader append) and
then the replication monitor waiter should be resolved with a truncation
error after leadership change.

Additionally the test seems to be missing some cleanup between
retries, which is fixed
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#61589
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61589#0194d3ed-2527-4b17-8b0e-7097f50c9e05 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61589#0194d3e9-7c1d-42b0-a9b8-607cce8b71cd FLAKY 1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/61589#0194d3ed-2527-43eb-ab6b-93459b0b749e FLAKY 1/2
rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=0.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61589#0194d3ed-2526-4c80-9655-bebcde80f77c FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/61589#0194d3ed-2525-4d46-b13c-f20bcf62309c FLAKY 1/2

@dotnwat
Copy link
Member

dotnwat commented Feb 5, 2025

thanks for the great cover letter

Copy link
Contributor

@nvartolomei nvartolomei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants