[fix][client] PIP-475: make async producer survive regular-to-scalable migration#25882
Open
merlimat wants to merge 1 commit into
Open
[fix][client] PIP-475: make async producer survive regular-to-scalable migration#25882merlimat wants to merge 1 commit into
merlimat wants to merge 1 commit into
Conversation
…e migration The sync send path was hardened in apache#25878 to ride through the synthetic->real-DAG transition (retry the segment-gone failure that arises when the per-segment producer is (re)created on a just-terminated partition). The async path (producer.async()...send() -> dispatchSendAttempt / appendToDispatchChain) had the same gap and was missed: * creation-time failures were not retried at all — appendToDispatchChain failed the user future immediately; * the send-failure retry matched only the typed TopicTerminated / AlreadyClosed exceptions, not the message-wrapped form the creation path surfaces; * it allowed only 3 attempts (~600ms), often too short for the layout push to arrive. Route both send- and creation-failures through the shared isSegmentGoneError detection and the SEND_RETRY_MAX_ATTEMPTS / SEND_RETRY_MAX_BACKOFF_MS budget already used by the sync path, via a single handleAsyncSegmentFailure helper. appendToDispatchChain now hands creation failures to a callback instead of failing the user future directly. Adds V5MigrationEndToEndTest.testV5AsyncProducerSurvivesMigration: async sends issued right after migration must all complete (none fail across the boundary) and every message stays consumable.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Follow-up to the PIP-475 migration work. #25878 hardened the sync producer send path so a connected V5 producer rides through the synthetic→real-DAG transition during a regular-to-scalable migration: when a per-segment producer is (re)created on a partition that the migration just terminated, the send retries onto an active child once the new layout arrives instead of failing.
The async path (
producer.async()…send()→dispatchSendAttempt/appendToDispatchChain) had the same gap and was missed, so an application using the async producer API and sending across a migration could still see sends fail.Modifications
appendToDispatchChainpreviously failed the user future immediately when the per-segment producer failed to create; it now hands the failure to a callback that applies the same retry decision as a send failure.sendAsyncfailure and the producer-creation failure now go through the sharedisSegmentGoneError(type or message match), instead of the async path's previous type-only check that missed the message-wrapped exception the creation path surfaces.SEND_RETRY_MAX_ATTEMPTS/SEND_RETRY_MAX_BACKOFF_MS(shared with the sync path) instead of the old 3-attempt (~600 ms) limit.No wire/API/behavior change for the happy path; this only affects retry behavior on the segment-gone transition.
Verification
V5MigrationEndToEndTest.testV5AsyncProducerSurvivesMigration: async sends issued immediately after migration must all complete (none fail across the boundary) and every pre-/post-migration message stays consumable. Passes alongside the existing sync + v4-lockout cases (3/3).pulsar-client-v5checkstyle/compile andpulsar-brokercheckstyle clean.