Skip to content

[fix][meta] Metadata cache refresh might not take effect#25246

Merged
lhotari merged 3 commits intoapache:masterfrom
BewareMyPower:bewaremypower/fix-metadata-cache-not-updated
Feb 16, 2026
Merged

[fix][meta] Metadata cache refresh might not take effect#25246
lhotari merged 3 commits intoapache:masterfrom
BewareMyPower:bewaremypower/fix-metadata-cache-not-updated

Conversation

@BewareMyPower
Copy link
Contributor

@BewareMyPower BewareMyPower commented Feb 13, 2026

Motivation

#25187 introduces a regression on MetadataCacheImpl#refresh. When a path is created or modified, the refresh method will be called by updating objCache with the future returned by readValueFromStore(). However, there is a race:

  1. [current thread] Acquire the lock of the node in the ConcurrentMap of objCache. Then call readValueFromStore, which calls store.get
  2. [metadata store worker thread] In the callback of store.get, objCache.getIfPresent calls the ConcurrentMap#get to get the cached future and check if it's the same with the future inserted to the map
  3. [current thread] Insert the future returned by to the ConcurrentMap of objCache and release the lock

The updated future of step 3 is not guaranteed to be immediately visible in step 2 because ConcurrentMap#get is lock-free, which means it does not need to wait the lock on path is released after step 3.

final var cachedFuture = objCache.getIfPresent(path);
if (cachedFuture != null && cachedFuture != future) {
if (log.isDebugEnabled()) {
log.debug("A new read on key {} is in progress or completed, ignore this one", path);
}
return cachedFuture;
}

The correctness of the code above is based on the fact that the cachedFuture must be the future of readValueFromStore, but not the existing cached future.

This logic was added originally because testCloneInReadModifyUpdateOrCreate failed. I've thought it's caused by duplicated refresh calls but it's actually not. However, the root cause is that when it failed, the create method implementation was wrong:

        return serialize(path, value).thenAcceptAsync(content -> store.put(path, content, Optional.of(-1L)))
                .thenApply(stat -> objCache.get(path))
                /* ... */
                .thenApply(__ -> null);

It should be thenComposeAsync rather than thenAcceptAsync, otherwise objCache.get might not see the updated value. This bug was accidentally fixed in my later commits.

Modifications

  • Revert the cachedFuture related code. We might need a careful design to reduce unnecessary deserializations when refresh is called multiple times for an update (e.g. the callback of MetadataCacheImpl#put and the notification method of AbstractMetadataStore#accept)
  • Add testRefreshRace to verify this race disappears.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@BewareMyPower BewareMyPower self-assigned this Feb 13, 2026
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 13, 2026
@BewareMyPower BewareMyPower added release/4.0.9 release/4.1.3 type/bug The PR fixed a bug or issue reported a bug area/metadata and removed doc-not-needed Your PR changes do not impact docs labels Feb 13, 2026
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 13, 2026
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member

lhotari commented Feb 13, 2026

/pulsarbot rerun-failure-checks

@BewareMyPower
Copy link
Contributor Author

@lhotari Do you know why ResourceQuotaCalculatorImplTest now run on Broker Group 2? I checked the previous workflows and found it should run in BROKER_FLAKY workflow. https://github.com/apache/pulsar/actions/runs/21906841628/job/63249299032

Anyway, I pushed a PR to fix this broken test (not flaky): #25247

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.59%. Comparing base (9db31cc) to head (b9b627c).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #25246       +/-   ##
=============================================
+ Coverage     37.96%   72.59%   +34.63%     
- Complexity    13510    34396    +20886     
=============================================
  Files          1902     1959       +57     
  Lines        151242   155401     +4159     
  Branches      17240    17724      +484     
=============================================
+ Hits          57413   112821    +55408     
+ Misses        86193    33576    -52617     
- Partials       7636     9004     +1368     
Flag Coverage Δ
inttests 25.61% <ø> (-0.49%) ⬇️
systests 22.38% <ø> (-0.20%) ⬇️
unittests 73.57% <ø> (+38.96%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../pulsar/metadata/cache/impl/MetadataCacheImpl.java 85.29% <ø> (+7.57%) ⬆️

... and 1410 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lhotari lhotari merged commit 24eba10 into apache:master Feb 16, 2026
53 checks passed
lhotari pushed a commit that referenced this pull request Feb 16, 2026
lhotari pushed a commit that referenced this pull request Feb 16, 2026
hshankar31 pushed a commit to datastax/pulsar that referenced this pull request Feb 17, 2026
(cherry picked from commit 24eba10)
(cherry picked from commit 6d81292)
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 18, 2026
…datastax 4 0 ds 16 feb (#589)

* [improve][broker] Ensure metadata session state visibility and improve Unstable observability for ServiceUnitStateChannelImpl (apache#25132)

(cherry picked from commit 2a29be0)
(cherry picked from commit 85dc758)

* [improve][broker] Upgrade bookkeeper to 4.17.3 (apache#25166)

(cherry picked from commit 45def39)
(cherry picked from commit 333110a)

* fix license and pom file

* [fix][ml] Fix NoSuchElementException in EntryCountEstimator caused by a race condition (apache#25177)

(cherry picked from commit 9b70ba3)
(cherry picked from commit 9261869)

* [fix][test] Bump org.assertj:assertj-core from 3.27.5 to 3.27.7 (apache#25186)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit ce4ebea)
(cherry picked from commit 2c3402e)

* [improve][misc] Upgrade snappy version to 1.1.10.8 (apache#25182)

(cherry picked from commit b15f53b)
(cherry picked from commit 304fea1)

* [fix][proxy] Close client connection immediately when credentials expire and forwardAuthorizationCredentials is disabled (apache#25179)

(cherry picked from commit 3348470)
(cherry picked from commit c06f8ba)

* [fix][client] ControlledClusterFailover avoid unnecessary reconnection. (apache#25178)

Co-authored-by: fengwenzhi <fengwenzhi.max@bigo.sg>
(cherry picked from commit f0ec07b)
(cherry picked from commit b41488d)

* [fix][sec] Bump org.apache.solr:solr-core from 9.8.0 to 9.10.1 in /pulsar-io/solr (apache#25175)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit a2f888a)
(cherry picked from commit b532068)

* [improve][pip] PIP-453: Improve the metadata store threading model (apache#25173)

(cherry picked from commit c51346f)
(cherry picked from commit d81d6b3)

* [improve][client]Reduce unnecessary getPartitionedTopicMetadata requests when using retry and DLQ topics. (apache#25172)

(cherry picked from commit 52a4d5e)
(cherry picked from commit 71a3994)

* [fix][misc] Allow JWT tokens in OpenID auth without nbf claim (apache#25197)

(cherry picked from commit d630394)
(cherry picked from commit 2760ee9)

* [fix][sec] Exclude org.lz4:lz4-java and standardize on at.yawk.lz4-java to remediate CVE-2025-12183 and CVE-2025-66566 (apache#25198)

(cherry picked from commit c07f2ad)
(cherry picked from commit 2ac6d03)

* fix checkstyle failure and license issues

* [fix] [test] Upgrade docker-java to 3.7.0 (apache#25209)

(cherry picked from commit 4add84c)
(cherry picked from commit 92b5d55)

* [fix][client] Fix race condition between isDuplicate() and flushAsync() method in PersistentAcknowledgmentsGroupingTracker due to incorrect use Netty Recycler (apache#25208)

(cherry picked from commit 5aab2f0)
(cherry picked from commit 2206949)

* [improve][monitor] Upgrade OpenTelemetry to 1.56.0, Otel instrumentation to 2.21.0 and Otel semconv to 1.37.0 (apache#24994)

(cherry picked from commit 53162ff)
(cherry picked from commit a1d5b6c)

* [improve][proxy] Add regression tests for package upload with 'Expect: 100-continue' (apache#25211)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit e8fedb1)
(cherry picked from commit 0947639)

* fix license issues

* [fix][test]Fix flaky ExtensibleLoadManagerImplTest_testGetMetrics (apache#25216)

(cherry picked from commit 257d42f)
(cherry picked from commit a8eac91)

* [fix][broker] Fix ManagedCursorImpl.asyncDelete() method may lose previous async mark delete properties in race condition (apache#25165)

(cherry picked from commit bea6f8a)
(cherry picked from commit 4332a44)

* [fix][broker]Fix ledgerHandle failed to read by using new BK API (apache#25199)

(cherry picked from commit 6d51f88)
(cherry picked from commit 1631fed)

* [fix][client] Fix producer synchronous retry handling in failPendingMessages method (apache#25207)

(cherry picked from commit 611efe4)
(cherry picked from commit 30ae8fb)

* [fix][broker] Prevent missed topic changes in topic watchers and schedule periodic refresh with patternAutoDiscoveryPeriod interval (apache#25188)

(cherry picked from commit 2e06cc0)
(cherry picked from commit ba2a230)

* fix for complilation error

* [feat][io] implement pip-297 for jdbc sinks (apache#25195)

(cherry picked from commit 6f4ac21)
(cherry picked from commit 998a4b1)

* [fix][broker] Fix httpProxyTimeout config (apache#25223)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit 2d6ef6f)
(cherry picked from commit 3b39c7b)

* [improve][broker] Add strictAuthMethod to require explicit authentication method (apache#25185)

Co-authored-by: Ómar K. Yasin <oyasin@apple.com>
(cherry picked from commit bae9173)
(cherry picked from commit 27e34f6)

* [feat][client] oauth2 trustcerts file and timeouts (apache#24944)

(cherry picked from commit b789d82)
(cherry picked from commit f8827bd)

* [improve][client] Make authorization server metadata path configurable in AuthenticationOAuth2 (apache#25052)

Co-authored-by: hoguni <hoguni@lycorp.co.jp>
(cherry picked from commit 3cb7a7b)
(cherry picked from commit 705a99d)

* Revert "[improve][broker] Add strictAuthMethod to require explicit authentication method (apache#25185)"

This reverts commit 531eb91.

* [improve][broker] Add idle timeout support for http (apache#25224)

(cherry picked from commit 63220ea)
(cherry picked from commit 144e064)

* [fix][broker] Fix incomplete futures in topic property update/delete methods (apache#25228)

(cherry picked from commit c2ae180)
(cherry picked from commit ab05ca2)

* [fix][test] Fix Mockito stubbing race in TopicListServiceTest (apache#25227)

(cherry picked from commit c93dd7a)
(cherry picked from commit 38a126b)

* [improve][broker] Give the detail error msg when authenticate failed with AuthenticationException (apache#25221)

(cherry picked from commit 0a0ce6d)
(cherry picked from commit 2a46c70)

* [fix][client] Send all chunkMessageIds to broker for redelivery (apache#25229)

(cherry picked from commit 0a0ce6d)
(cherry picked from commit f49c7b2)

* [fix][broker] Fix transactionMetadataFuture completeExceptionally with null value (apache#25231)

Co-authored-by: 张浩 <zhanghao60@100.me>
(cherry picked from commit 0e5d424)
(cherry picked from commit 42283f4)

* uncomment distribution management in pom

* Reapply "[improve][meta] PIP-453: Improve the metadata store threading model (apache#25187)"

This reverts commit a6aab86.

(cherry picked from commit 4f9b2ca)

* [improve] Upgrade Netty to 4.1.131.Final (apache#25232)

(cherry picked from commit db91b93)
(cherry picked from commit a6c602a)

* [fix][test] fix testBatchMetadataStoreMetrics. (apache#25241)

(cherry picked from commit 9db31cc)
(cherry picked from commit abbd478)

* [fix][test] Fix ResourceQuotaCalculatorImplTest#testNeedToReportLocalUsage (apache#25247)

(cherry picked from commit 48774de)
(cherry picked from commit 9343837)

* [fix][meta] Metadata cache refresh might not take effect (apache#25246)

(cherry picked from commit 24eba10)
(cherry picked from commit 6d81292)

* fix pulsar-proxy unit test case failure

* fix safe delete URLRegexLookupProxyHandler which is not used

* Revert "fix safe delete URLRegexLookupProxyHandler which is not used"

This reverts commit 158fc14.

* Revert "fix pulsar-proxy unit test case failure"

This reverts commit 4efcf70.

* updated hardcoded newLookupProxyHandler in ProxyService for failing URLRegexLookupProxyHandlerTest

* Revert "[improve][monitor] Upgrade OpenTelemetry to 1.56.0, Otel instrumentation to 2.21.0 and Otel semconv to 1.37.0 (apache#24994)"

This reverts commit 5e5328e

* reverted lincense for opentelemetry upgrade changes

* Revert "updated hardcoded newLookupProxyHandler in ProxyService for failing URLRegexLookupProxyHandlerTest"

This reverts commit a4f07dc.

* reverted mismatch commits changes in ProxyConnection.java

* fix code-style issue

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Kai Wang <kwang@apache.org>
Co-authored-by: Yong Zhang <zhangyong1025.zy@gmail.com>
Co-authored-by: Lari Hotari <lhotari@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Zixuan Liu <nodeces@gmail.com>
Co-authored-by: Wenzhi Feng <thetumbled@apache.org>
Co-authored-by: fengwenzhi <fengwenzhi.max@bigo.sg>
Co-authored-by: Yunze Xu <xyzinfernity@163.com>
Co-authored-by: zhenJiangWang <zhenjiang427@gmail.com>
Co-authored-by: guptas6est <sanaya.gupta@est.tech>
Co-authored-by: Matteo Merli <mmerli@apache.org>
Co-authored-by: Oneby Wang <44369297+oneby-wang@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: fengyubiao <yubiao.feng@streamnative.io>
Co-authored-by: Malla Sandeep <sandeep.malla78@gmail.com>
Co-authored-by: Bäm <dev@sandchaschte.ch>
Co-authored-by: Omar Yasin <omarkj@icloud.com>
Co-authored-by: Ómar K. Yasin <oyasin@apple.com>
Co-authored-by: gulecroc <gu.lecroc@gmail.com>
Co-authored-by: Hideaki Oguni <22386882+izumo27@users.noreply.github.com>
Co-authored-by: hoguni <hoguni@lycorp.co.jp>
Co-authored-by: Cong Zhao <zhaocong@apache.org>
Co-authored-by: sinan liu <liusinan1998@gmail.com>
Co-authored-by: Jiwei Guo <technoboy@apache.org>
Co-authored-by: cai minjian <905767378@qq.com>
Co-authored-by: Hao Zhang <zhanghao1@cmss.chinamobile.com>
Co-authored-by: 张浩 <zhanghao60@100.me>
Co-authored-by: Lari Hotari <lhotari@apache.org>
Co-authored-by: zzb <48124861+zhaizhibo@users.noreply.github.com>
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 23, 2026
(cherry picked from commit 24eba10)
(cherry picked from commit 6d81292)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants