Skip to content

Conversation

@AndreMouche
Copy link
Member

@AndreMouche AndreMouche commented Dec 19, 2025

fix #1823

Summary by CodeRabbit

  • Refactor

    • Centralized and made store metadata access thread-safe (addresses, types, labels), improving stability of store selection, region resolution, and logging; removed obsolete state paths.
  • Tests

    • Updated and expanded tests to validate metadata updates, restart scenarios, leader selection, and label-based behaviors.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 19, 2025
Signed-off-by: AndreMouche <AndreMouche@126.com>
@AndreMouche AndreMouche force-pushed the store_cache_update_addr branch from b79e950 to 4c68ce5 Compare December 19, 2025 00:53
Copy link
Contributor

@zyguan zyguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The following micro-bench results show there is no significant performance regression.

workload threads baseline this fix diff
pointget 64 84676.63 84574.63 -0.12%
batchget 64 24640.70 24687.14 0.19%
tblscan 64 41237.60 40384.53 -2.07%
idxscan 64 39560.02 39342.72 -0.55%
idxlookup 64 16367.99 16306.71 -0.37%

@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zyguan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 19, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-12-19 08:00:21.602995324 +0000 UTC m=+1805566.416772906: ☑️ agreed by zyguan.

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Dec 19, 2025
@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

Introduce mutex-guarded store metadata accessors (StoreType, GetAddr, GetLabels), add updateMetadataFrom and put to store cache to allow in-place metadata updates, remove Deleted resolve state, and update region cache, request logic, logging, and tests to use the new accessors.

Changes

Cohort / File(s) Summary
Store Cache Core
internal/locate/store_cache.go
Add metaMu, updateMetadataFrom(), and put(); add GetAddr(), GetPeerAddr(), GetLabels(), guarded StoreType(); perform in-place metadata updates instead of replacing store objects; remove Deleted resolve state; replace direct field reads/writes with accessors.
Region Cache Core
internal/locate/region_cache.go
Replace direct store.storeType, store.addr, store.labels usages with StoreType(), GetAddr(), GetLabels(); update proxy-store logic, store reconstruction, reResolve call sites, and logging to use getters.
Region Request Handling
internal/locate/region_request.go
Use GetAddr() for address metrics/logging and StoreType() for TiFlash-related checks; update metrics labels and liveness/logging to use accessors.
Tests: Region / Replica / Requests
internal/locate/region_cache_test.go, internal/locate/region_request3_test.go, internal/locate/replica_selector_test.go, internal/locate/region_request3_test.go
Replace direct field access with GetAddr()/GetLabels(); add slices import where needed; add TestStoreRestartWithNewLabels verifying label updates via accessors.
Minor Adjustments / Logging
internal/locate/region_cache_test.go, internal/locate/region_request.go, ...
Update RPCContext.String(), logs, and metrics to call GetAddr()/StoreType(); adjust store reResolve call signatures and remove obsolete code paths (e.g., changeToActiveStore).

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant PD as PD / Meta
participant StoreCache as StoreCache
participant Store as Store (metaMu)
participant RegionCache as RegionCache
participant Health as HealthCheck
PD->>StoreCache: send store meta update (addr/labels/type)
StoreCache->>Store: call updateMetadataFrom(meta)
Store->>Store: acquire metaMu, update addr/peerAddr/labels/type
StoreCache->>RegionCache: retain existing Store pointer (no replacement)
RegionCache->>Store: call StoreType()/GetAddr()/GetLabels() when evaluating replicas
Health->>Store: call StoreType()/GetAddr() for health checks / reResolve

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰
I nibbled fields and stitched them tight,
A mutex hug to guard the night,
Getters hum where wild labels played,
Regions wake with changes made,
A joyful hop — no pointers frayed. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main objective: fixing inconsistent store pointers between region-cache and store-cache, which directly addresses the core issue.
Linked Issues check ✅ Passed The PR addresses issue #1823 by introducing accessor methods (GetAddr, GetLabels, StoreType) to ensure consistent access to store metadata and updating region-cache to use these accessors instead of direct field access.
Out of Scope Changes check ✅ Passed All changes focus on introducing accessor methods for store metadata and updating code to use them consistently across region-cache, store-cache, and related modules, which is directly aligned with fixing inconsistent store pointer handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/locate/store_cache.go`:
- Around line 315-320: GetLabels currently returns the internal slice s.labels
directly under Store; change GetLabels in type Store to return a defensive copy
of s.labels while holding metaMu (use s.metaMu.RLock()/RUnlock()) so callers
cannot mutate the internal slice (e.g., via append) and break thread-safety—copy
elements into a new slice and return that copy instead of s.labels.
🧹 Nitpick comments (1)
internal/locate/region_cache_test.go (1)

3118-3120: Avoid mutating Store labels returned by GetLabels

GetLabels() exposes internal metadata; appending here can mutate the store’s labels and mask the update-path the test is trying to validate. Clone before append so the test remains isolated.

♻️ Suggested tweak
- labels := store1.GetLabels()
- labels = append(labels, &metapb.StoreLabel{Key: "host", Value: "0.0.0.0:20161"})
+ labels := append([]*metapb.StoreLabel(nil), store1.GetLabels()...)
+ labels = append(labels, &metapb.StoreLabel{Key: "host", Value: "0.0.0.0:20161"})
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5f6398 and c0c4fab.

📒 Files selected for processing (6)
  • internal/locate/region_cache.go
  • internal/locate/region_cache_test.go
  • internal/locate/region_request.go
  • internal/locate/region_request3_test.go
  • internal/locate/replica_selector_test.go
  • internal/locate/store_cache.go
🧰 Additional context used
🧬 Code graph analysis (4)
internal/locate/region_request3_test.go (3)
tikvrpc/endpoint.go (1)
  • TiKV (44-44)
internal/locate/region_cache.go (1)
  • WithMatchLabels (859-863)
tikv/region.go (1)
  • WithMatchLabels (181-183)
internal/locate/region_request.go (6)
util/redact/redact.go (1)
  • String (34-39)
metrics/metrics.go (1)
  • TiKVStoreLimitErrorCounter (93-93)
internal/locate/store_cache.go (1)
  • Store (216-240)
tikv/region.go (1)
  • Store (142-142)
internal/mockstore/mocktikv/cluster.go (1)
  • Store (762-765)
tikvrpc/endpoint.go (1)
  • TiFlashCompute (47-47)
internal/locate/store_cache.go (3)
tikvrpc/endpoint.go (2)
  • TiFlash (45-45)
  • TiKV (44-44)
metrics/shortcuts.go (1)
  • RegionCacheCounterWithInvalidateStoreRegionsOK (125-125)
internal/logutil/log.go (1)
  • BgLogger (46-48)
internal/locate/region_cache.go (2)
internal/locate/store_cache.go (1)
  • Store (216-240)
tikvrpc/endpoint.go (2)
  • TiKV (44-44)
  • TiFlash (45-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test
  • GitHub Check: integration-raw-tikv (v1ttl)
  • GitHub Check: race-test
  • GitHub Check: integration-local-race
  • GitHub Check: integration-tikv
  • GitHub Check: integration-raw-tikv (v2)
  • GitHub Check: integration-local
  • GitHub Check: integration-next-gen-tikv
🔇 Additional comments (15)
internal/locate/region_request.go (5)

781-788: Logging uses GetAddr—nice alignment with accessor-based metadata.

Keeps log output consistent with refreshed store metadata.


1699-1708: Metrics label now uses GetAddr—good consistency with store accessor API.


1781-1783: Store type check via StoreType() is consistent with encapsulation.


1791-1804: TiFlash-related backoff now uses StoreType() accessor—looks good.


2086-2094: ServerIsBusy TiFlash path uses StoreType() accessor—consistent update.

internal/locate/region_request3_test.go (6)

237-243: Test liveness hook now uses GetAddr—aligned with accessor API.


368-376: RPC context assertions use GetAddr for both target and proxy—nice consistency.


1069-1072: Leader address derived via GetAddr—keeps tests consistent with store accessors.


1427-1436: Follower label/addr assertions via GetLabels/GetAddr—consistent with new API.


1454-1460: PreferLeader test uses GetAddr for leader match—good accessor alignment.


1519-1593: Leader handoff test now reads/writes leader addr via GetAddr—clean update.

internal/locate/replica_selector_test.go (1)

69-72: Store label assertions now use GetLabels—consistent with accessor-based metadata.

internal/locate/region_cache_test.go (1)

367-367: Tests now use accessor-based assertions

Good alignment with the new encapsulation (GetAddr() / GetLabels()), keeping tests consistent with thread-safe metadata access.

Also applies to: 445-451, 2303-2310, 3141-3147

internal/locate/store_cache.go (1)

149-154: Metadata access is now well-encapsulated

metaMu + updateMetadataFrom and accessor usage keep store metadata consistent and support in-place updates, which helps with pointer consistency.

Also applies to: 217-273, 285-313, 337-365, 423-475, 478-525, 560-565, 628-632, 659-665, 689-732, 1119-1123

internal/locate/region_cache.go (1)

379-384: Accessor adoption across region cache paths looks consistent

Good to see StoreType() / GetAddr() / GetLabels() used everywhere, keeping region-cache logic aligned with the new store metadata encapsulation.

Also applies to: 802-804, 838-844, 939-940, 1008-1009, 1043-1044, 1055-1056, 1065-1066, 1791-1792, 2202-2210, 2706-2707, 2719-2720, 2818-2822, 2851-2852

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Signed-off-by: AndreMouche <AndreMouche@126.com>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 16, 2026
Signed-off-by: AndreMouche <AndreMouche@126.com>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 16, 2026
Signed-off-by: AndreMouche <AndreMouche@126.com>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 16, 2026
@AndreMouche
Copy link
Member Author

@cfzjywxk @ekexium @MyonKeminta PTAL, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. needs-1-more-lgtm Indicates a PR needs 1 more LGTM. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent store pointers between the region-cache and store-cache cause stale regions to become inaccessible.

2 participants