Skip to content

[fix][client] Fix producer synchronous retry handling in failPendingMessages method#25207

Merged
lhotari merged 11 commits intoapache:masterfrom
cognitree:producer-25201
Feb 5, 2026
Merged

[fix][client] Fix producer synchronous retry handling in failPendingMessages method#25207
lhotari merged 11 commits intoapache:masterfrom
cognitree:producer-25201

Conversation

@sandeep-mst
Copy link
Contributor

Fixes #25201

Motivation

Fix a re-entrancy bug in ProducerImpl.failPendingMessages. While executing on the timer, failPendingMessages invokes sendComplete(ex) on pending messages, which can synchronously trigger a retry from client code. The subsequent pendingMessages.clear() then removes the newly enqueued retry operation, leaving the retry’s CompletableFuture unresolved and the client in a limbo state.

Modifications

Updated failPendingMessages to first drain the pendingMessages queue into a local list and clear the queue before iterating. This prevents re-entrant retries triggered during sendComplete from being inadvertently cleared.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • org.apache.pulsar.client.impl.ProducerImplTest#testFailPendingMessagesSyncRetry

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:
cognitree#29

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 3, 2026
@lhotari
Copy link
Member

lhotari commented Feb 4, 2026

great work @sandeep-mst .

Regarding the tests, would it be possible to add a reproducer similar as described in #25201? (test without mocking the client, just using MockedPulsarServiceBaseTest base class)

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work @sandeep-mst . Just a wish to add a test mentioned in the description.

@sandeep-mst
Copy link
Contributor Author

sandeep-mst commented Feb 4, 2026

LGTM, great work @sandeep-mst . Just a wish to add a test mentioned in the description.

Will add it if possible. That test will go to pulsar-broker module. Should I remove the currently added test afterwards?

@lhotari
Copy link
Member

lhotari commented Feb 4, 2026

LGTM, great work @sandeep-mst . Just a wish to add a test mentioned in the description.

Will add it if possible. That test will go to pulsar-broker module. Should I remove the currently added test afterwards?

I think the current unit tests are fine. The additional test would be a bit like an integration test. In Pulsar, the tests which use the MockedPulsarServiceBaseTest base class are in most cases integration tests although we call them "unit tests" in CI.

@sandeep-mst
Copy link
Contributor Author

@lhotari Added the new integration test in pulsar-broker module. Please review and let me know if anything else is required.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work @sandeep-mst

@sandeep-mst
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@lhotari lhotari merged commit 611efe4 into apache:master Feb 5, 2026
53 of 55 checks passed
lhotari pushed a commit that referenced this pull request Feb 6, 2026
lhotari pushed a commit that referenced this pull request Feb 6, 2026
lhotari pushed a commit that referenced this pull request Feb 6, 2026
hshankar31 pushed a commit to datastax/pulsar that referenced this pull request Feb 11, 2026
…essages method (apache#25207)

(cherry picked from commit 611efe4)
(cherry picked from commit 30ae8fb)
Rutuja-IBM pushed a commit to datastax/pulsar that referenced this pull request Feb 13, 2026
…essages method (apache#25207)

(cherry picked from commit 611efe4)
(cherry picked from commit bf92418)
hshankar31 pushed a commit to datastax/pulsar that referenced this pull request Feb 16, 2026
…essages method (apache#25207)

(cherry picked from commit 611efe4)
(cherry picked from commit 30ae8fb)
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 18, 2026
…essages method (apache#25207)

(cherry picked from commit 611efe4)
(cherry picked from commit bf92418)
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 18, 2026
…datastax 4 0 ds 16 feb (#589)

* [improve][broker] Ensure metadata session state visibility and improve Unstable observability for ServiceUnitStateChannelImpl (apache#25132)

(cherry picked from commit 2a29be0)
(cherry picked from commit 85dc758)

* [improve][broker] Upgrade bookkeeper to 4.17.3 (apache#25166)

(cherry picked from commit 45def39)
(cherry picked from commit 333110a)

* fix license and pom file

* [fix][ml] Fix NoSuchElementException in EntryCountEstimator caused by a race condition (apache#25177)

(cherry picked from commit 9b70ba3)
(cherry picked from commit 9261869)

* [fix][test] Bump org.assertj:assertj-core from 3.27.5 to 3.27.7 (apache#25186)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit ce4ebea)
(cherry picked from commit 2c3402e)

* [improve][misc] Upgrade snappy version to 1.1.10.8 (apache#25182)

(cherry picked from commit b15f53b)
(cherry picked from commit 304fea1)

* [fix][proxy] Close client connection immediately when credentials expire and forwardAuthorizationCredentials is disabled (apache#25179)

(cherry picked from commit 3348470)
(cherry picked from commit c06f8ba)

* [fix][client] ControlledClusterFailover avoid unnecessary reconnection. (apache#25178)

Co-authored-by: fengwenzhi <fengwenzhi.max@bigo.sg>
(cherry picked from commit f0ec07b)
(cherry picked from commit b41488d)

* [fix][sec] Bump org.apache.solr:solr-core from 9.8.0 to 9.10.1 in /pulsar-io/solr (apache#25175)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit a2f888a)
(cherry picked from commit b532068)

* [improve][pip] PIP-453: Improve the metadata store threading model (apache#25173)

(cherry picked from commit c51346f)
(cherry picked from commit d81d6b3)

* [improve][client]Reduce unnecessary getPartitionedTopicMetadata requests when using retry and DLQ topics. (apache#25172)

(cherry picked from commit 52a4d5e)
(cherry picked from commit 71a3994)

* [fix][misc] Allow JWT tokens in OpenID auth without nbf claim (apache#25197)

(cherry picked from commit d630394)
(cherry picked from commit 2760ee9)

* [fix][sec] Exclude org.lz4:lz4-java and standardize on at.yawk.lz4-java to remediate CVE-2025-12183 and CVE-2025-66566 (apache#25198)

(cherry picked from commit c07f2ad)
(cherry picked from commit 2ac6d03)

* fix checkstyle failure and license issues

* [fix] [test] Upgrade docker-java to 3.7.0 (apache#25209)

(cherry picked from commit 4add84c)
(cherry picked from commit 92b5d55)

* [fix][client] Fix race condition between isDuplicate() and flushAsync() method in PersistentAcknowledgmentsGroupingTracker due to incorrect use Netty Recycler (apache#25208)

(cherry picked from commit 5aab2f0)
(cherry picked from commit 2206949)

* [improve][monitor] Upgrade OpenTelemetry to 1.56.0, Otel instrumentation to 2.21.0 and Otel semconv to 1.37.0 (apache#24994)

(cherry picked from commit 53162ff)
(cherry picked from commit a1d5b6c)

* [improve][proxy] Add regression tests for package upload with 'Expect: 100-continue' (apache#25211)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit e8fedb1)
(cherry picked from commit 0947639)

* fix license issues

* [fix][test]Fix flaky ExtensibleLoadManagerImplTest_testGetMetrics (apache#25216)

(cherry picked from commit 257d42f)
(cherry picked from commit a8eac91)

* [fix][broker] Fix ManagedCursorImpl.asyncDelete() method may lose previous async mark delete properties in race condition (apache#25165)

(cherry picked from commit bea6f8a)
(cherry picked from commit 4332a44)

* [fix][broker]Fix ledgerHandle failed to read by using new BK API (apache#25199)

(cherry picked from commit 6d51f88)
(cherry picked from commit 1631fed)

* [fix][client] Fix producer synchronous retry handling in failPendingMessages method (apache#25207)

(cherry picked from commit 611efe4)
(cherry picked from commit 30ae8fb)

* [fix][broker] Prevent missed topic changes in topic watchers and schedule periodic refresh with patternAutoDiscoveryPeriod interval (apache#25188)

(cherry picked from commit 2e06cc0)
(cherry picked from commit ba2a230)

* fix for complilation error

* [feat][io] implement pip-297 for jdbc sinks (apache#25195)

(cherry picked from commit 6f4ac21)
(cherry picked from commit 998a4b1)

* [fix][broker] Fix httpProxyTimeout config (apache#25223)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit 2d6ef6f)
(cherry picked from commit 3b39c7b)

* [improve][broker] Add strictAuthMethod to require explicit authentication method (apache#25185)

Co-authored-by: Ómar K. Yasin <oyasin@apple.com>
(cherry picked from commit bae9173)
(cherry picked from commit 27e34f6)

* [feat][client] oauth2 trustcerts file and timeouts (apache#24944)

(cherry picked from commit b789d82)
(cherry picked from commit f8827bd)

* [improve][client] Make authorization server metadata path configurable in AuthenticationOAuth2 (apache#25052)

Co-authored-by: hoguni <hoguni@lycorp.co.jp>
(cherry picked from commit 3cb7a7b)
(cherry picked from commit 705a99d)

* Revert "[improve][broker] Add strictAuthMethod to require explicit authentication method (apache#25185)"

This reverts commit 531eb91.

* [improve][broker] Add idle timeout support for http (apache#25224)

(cherry picked from commit 63220ea)
(cherry picked from commit 144e064)

* [fix][broker] Fix incomplete futures in topic property update/delete methods (apache#25228)

(cherry picked from commit c2ae180)
(cherry picked from commit ab05ca2)

* [fix][test] Fix Mockito stubbing race in TopicListServiceTest (apache#25227)

(cherry picked from commit c93dd7a)
(cherry picked from commit 38a126b)

* [improve][broker] Give the detail error msg when authenticate failed with AuthenticationException (apache#25221)

(cherry picked from commit 0a0ce6d)
(cherry picked from commit 2a46c70)

* [fix][client] Send all chunkMessageIds to broker for redelivery (apache#25229)

(cherry picked from commit 0a0ce6d)
(cherry picked from commit f49c7b2)

* [fix][broker] Fix transactionMetadataFuture completeExceptionally with null value (apache#25231)

Co-authored-by: 张浩 <zhanghao60@100.me>
(cherry picked from commit 0e5d424)
(cherry picked from commit 42283f4)

* uncomment distribution management in pom

* Reapply "[improve][meta] PIP-453: Improve the metadata store threading model (apache#25187)"

This reverts commit a6aab86.

(cherry picked from commit 4f9b2ca)

* [improve] Upgrade Netty to 4.1.131.Final (apache#25232)

(cherry picked from commit db91b93)
(cherry picked from commit a6c602a)

* [fix][test] fix testBatchMetadataStoreMetrics. (apache#25241)

(cherry picked from commit 9db31cc)
(cherry picked from commit abbd478)

* [fix][test] Fix ResourceQuotaCalculatorImplTest#testNeedToReportLocalUsage (apache#25247)

(cherry picked from commit 48774de)
(cherry picked from commit 9343837)

* [fix][meta] Metadata cache refresh might not take effect (apache#25246)

(cherry picked from commit 24eba10)
(cherry picked from commit 6d81292)

* fix pulsar-proxy unit test case failure

* fix safe delete URLRegexLookupProxyHandler which is not used

* Revert "fix safe delete URLRegexLookupProxyHandler which is not used"

This reverts commit 158fc14.

* Revert "fix pulsar-proxy unit test case failure"

This reverts commit 4efcf70.

* updated hardcoded newLookupProxyHandler in ProxyService for failing URLRegexLookupProxyHandlerTest

* Revert "[improve][monitor] Upgrade OpenTelemetry to 1.56.0, Otel instrumentation to 2.21.0 and Otel semconv to 1.37.0 (apache#24994)"

This reverts commit 5e5328e

* reverted lincense for opentelemetry upgrade changes

* Revert "updated hardcoded newLookupProxyHandler in ProxyService for failing URLRegexLookupProxyHandlerTest"

This reverts commit a4f07dc.

* reverted mismatch commits changes in ProxyConnection.java

* fix code-style issue

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Kai Wang <kwang@apache.org>
Co-authored-by: Yong Zhang <zhangyong1025.zy@gmail.com>
Co-authored-by: Lari Hotari <lhotari@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Zixuan Liu <nodeces@gmail.com>
Co-authored-by: Wenzhi Feng <thetumbled@apache.org>
Co-authored-by: fengwenzhi <fengwenzhi.max@bigo.sg>
Co-authored-by: Yunze Xu <xyzinfernity@163.com>
Co-authored-by: zhenJiangWang <zhenjiang427@gmail.com>
Co-authored-by: guptas6est <sanaya.gupta@est.tech>
Co-authored-by: Matteo Merli <mmerli@apache.org>
Co-authored-by: Oneby Wang <44369297+oneby-wang@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: fengyubiao <yubiao.feng@streamnative.io>
Co-authored-by: Malla Sandeep <sandeep.malla78@gmail.com>
Co-authored-by: Bäm <dev@sandchaschte.ch>
Co-authored-by: Omar Yasin <omarkj@icloud.com>
Co-authored-by: Ómar K. Yasin <oyasin@apple.com>
Co-authored-by: gulecroc <gu.lecroc@gmail.com>
Co-authored-by: Hideaki Oguni <22386882+izumo27@users.noreply.github.com>
Co-authored-by: hoguni <hoguni@lycorp.co.jp>
Co-authored-by: Cong Zhao <zhaocong@apache.org>
Co-authored-by: sinan liu <liusinan1998@gmail.com>
Co-authored-by: Jiwei Guo <technoboy@apache.org>
Co-authored-by: cai minjian <905767378@qq.com>
Co-authored-by: Hao Zhang <zhanghao1@cmss.chinamobile.com>
Co-authored-by: 张浩 <zhanghao60@100.me>
Co-authored-by: Lari Hotari <lhotari@apache.org>
Co-authored-by: zzb <48124861+zhaizhibo@users.noreply.github.com>
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 23, 2026
…essages method (apache#25207)

(cherry picked from commit 611efe4)
(cherry picked from commit 30ae8fb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Producer synchronous retries can cause retry sendAsync future to never complete

5 participants