-
Notifications
You must be signed in to change notification settings - Fork 253
fix inconsistent store pointers between the region-cache and store-cache #1826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
b79e950 to
4c68ce5
Compare
zyguan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The following micro-bench results show there is no significant performance regression.
| workload | threads | baseline | this fix | diff |
|---|---|---|---|---|
| pointget | 64 | 84676.63 | 84574.63 | -0.12% |
| batchget | 64 | 24640.70 | 24687.14 | 0.19% |
| tblscan | 64 | 41237.60 | 40384.53 | -2.07% |
| idxscan | 64 | 39560.02 | 39342.72 | -0.55% |
| idxlookup | 64 | 16367.99 | 16306.71 | -0.37% |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: zyguan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
📝 WalkthroughWalkthroughIntroduce mutex-guarded store metadata accessors (StoreType, GetAddr, GetLabels), add updateMetadataFrom and put to store cache to allow in-place metadata updates, remove Deleted resolve state, and update region cache, request logic, logging, and tests to use the new accessors. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@internal/locate/store_cache.go`:
- Around line 315-320: GetLabels currently returns the internal slice s.labels
directly under Store; change GetLabels in type Store to return a defensive copy
of s.labels while holding metaMu (use s.metaMu.RLock()/RUnlock()) so callers
cannot mutate the internal slice (e.g., via append) and break thread-safety—copy
elements into a new slice and return that copy instead of s.labels.
🧹 Nitpick comments (1)
internal/locate/region_cache_test.go (1)
3118-3120: Avoid mutating Store labels returned byGetLabels
GetLabels()exposes internal metadata; appending here can mutate the store’s labels and mask the update-path the test is trying to validate. Clone before append so the test remains isolated.♻️ Suggested tweak
- labels := store1.GetLabels() - labels = append(labels, &metapb.StoreLabel{Key: "host", Value: "0.0.0.0:20161"}) + labels := append([]*metapb.StoreLabel(nil), store1.GetLabels()...) + labels = append(labels, &metapb.StoreLabel{Key: "host", Value: "0.0.0.0:20161"})
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
internal/locate/region_cache.gointernal/locate/region_cache_test.gointernal/locate/region_request.gointernal/locate/region_request3_test.gointernal/locate/replica_selector_test.gointernal/locate/store_cache.go
🧰 Additional context used
🧬 Code graph analysis (4)
internal/locate/region_request3_test.go (3)
tikvrpc/endpoint.go (1)
TiKV(44-44)internal/locate/region_cache.go (1)
WithMatchLabels(859-863)tikv/region.go (1)
WithMatchLabels(181-183)
internal/locate/region_request.go (6)
util/redact/redact.go (1)
String(34-39)metrics/metrics.go (1)
TiKVStoreLimitErrorCounter(93-93)internal/locate/store_cache.go (1)
Store(216-240)tikv/region.go (1)
Store(142-142)internal/mockstore/mocktikv/cluster.go (1)
Store(762-765)tikvrpc/endpoint.go (1)
TiFlashCompute(47-47)
internal/locate/store_cache.go (3)
tikvrpc/endpoint.go (2)
TiFlash(45-45)TiKV(44-44)metrics/shortcuts.go (1)
RegionCacheCounterWithInvalidateStoreRegionsOK(125-125)internal/logutil/log.go (1)
BgLogger(46-48)
internal/locate/region_cache.go (2)
internal/locate/store_cache.go (1)
Store(216-240)tikvrpc/endpoint.go (2)
TiKV(44-44)TiFlash(45-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: test
- GitHub Check: integration-raw-tikv (v1ttl)
- GitHub Check: race-test
- GitHub Check: integration-local-race
- GitHub Check: integration-tikv
- GitHub Check: integration-raw-tikv (v2)
- GitHub Check: integration-local
- GitHub Check: integration-next-gen-tikv
🔇 Additional comments (15)
internal/locate/region_request.go (5)
781-788: Logging uses GetAddr—nice alignment with accessor-based metadata.Keeps log output consistent with refreshed store metadata.
1699-1708: Metrics label now uses GetAddr—good consistency with store accessor API.
1781-1783: Store type check via StoreType() is consistent with encapsulation.
1791-1804: TiFlash-related backoff now uses StoreType() accessor—looks good.
2086-2094: ServerIsBusy TiFlash path uses StoreType() accessor—consistent update.internal/locate/region_request3_test.go (6)
237-243: Test liveness hook now uses GetAddr—aligned with accessor API.
368-376: RPC context assertions use GetAddr for both target and proxy—nice consistency.
1069-1072: Leader address derived via GetAddr—keeps tests consistent with store accessors.
1427-1436: Follower label/addr assertions via GetLabels/GetAddr—consistent with new API.
1454-1460: PreferLeader test uses GetAddr for leader match—good accessor alignment.
1519-1593: Leader handoff test now reads/writes leader addr via GetAddr—clean update.internal/locate/replica_selector_test.go (1)
69-72: Store label assertions now use GetLabels—consistent with accessor-based metadata.internal/locate/region_cache_test.go (1)
367-367: Tests now use accessor-based assertionsGood alignment with the new encapsulation (
GetAddr()/GetLabels()), keeping tests consistent with thread-safe metadata access.Also applies to: 445-451, 2303-2310, 3141-3147
internal/locate/store_cache.go (1)
149-154: Metadata access is now well-encapsulated
metaMu+updateMetadataFromand accessor usage keep store metadata consistent and support in-place updates, which helps with pointer consistency.Also applies to: 217-273, 285-313, 337-365, 423-475, 478-525, 560-565, 628-632, 659-665, 689-732, 1119-1123
internal/locate/region_cache.go (1)
379-384: Accessor adoption across region cache paths looks consistentGood to see
StoreType()/GetAddr()/GetLabels()used everywhere, keeping region-cache logic aligned with the new store metadata encapsulation.Also applies to: 802-804, 838-844, 939-940, 1008-1009, 1043-1044, 1055-1056, 1065-1066, 1791-1792, 2202-2210, 2706-2707, 2719-2720, 2818-2822, 2851-2852
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
Signed-off-by: AndreMouche <AndreMouche@126.com>
|
@cfzjywxk @ekexium @MyonKeminta PTAL, thanks |
fix #1823
Summary by CodeRabbit
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.