Skip to content

Conversation

@kjnilsson
Copy link
Contributor

@kjnilsson kjnilsson commented Sep 23, 2025

A quorum queue client that send a message during a network partition
that later caused a distribution disconnection would in some cases
never resend the lost message, even if kept in the pending buffer.
Subsequent sends would be accepted by the state machine but would
never be enqueued as there would be a missing sequence.

In the case of publishers that use pre-settled sends the pending
messages would have also been incorrectly removed from the
pending map.

To fix we removed timer resend aapproach and instead have the leader
send leader_change messages on node up to prompt any queue clients
to resend their pending buffer.

@kjnilsson kjnilsson changed the title add some logging QQ: fix resend issues after network partition. Sep 23, 2025
@michaelklishin michaelklishin added this to the 4.3.0 milestone Sep 23, 2025
A queue client that send a message during a network partition
that later caused a distribution disconnection would in some cases
never resend the lost message, even if kept in the pending buffer.
Subsequent sends would be accepted by the state machine but would
never be enqueued as there would be a missing sequence.

In the case of publishers that use pre-settled sends the pending
messages would have also been incorrectly removed from the
pending map.

To fix we removed timer resend aapproach and instead have the leader
send leader_change messages on node up to prompt any queue clients
to resend their pending buffer.
@kjnilsson kjnilsson marked this pull request as ready for review September 24, 2025 08:02
@kjnilsson kjnilsson requested review from mkuratczyk and removed request for mkuratczyk September 24, 2025 08:02
Copy link
Collaborator

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Selenium suite failures are unrelated and affect (or affected, I've merged a Selenium-related PR earlier today) other branches.

@michaelklishin michaelklishin changed the title QQ: fix resend issues after network partition. Quorum queues: fix resend issues after network partition Sep 24, 2025
@michaelklishin michaelklishin merged commit 200127c into main Sep 24, 2025
288 of 291 checks passed
@michaelklishin michaelklishin deleted the qq-resend-issue branch September 24, 2025 21:17
michaelklishin added a commit that referenced this pull request Sep 26, 2025
Quorum queues: fix resend issues after network partition (backport #14589)
michaelklishin added a commit that referenced this pull request Sep 26, 2025
Quorum queues: fix resend issues after network partition (backport #14589) (backport #14605)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants