-
Notifications
You must be signed in to change notification settings - Fork 753
mcs: add affinity redirect and scheduling watcher #10042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10042 +/- ##
==========================================
+ Coverage 78.51% 78.65% +0.14%
==========================================
Files 513 517 +4
Lines 68893 69393 +500
==========================================
+ Hits 54091 54584 +493
+ Misses 10887 10862 -25
- Partials 3915 3947 +32
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
079ad98 to
4dc8ea5
Compare
aad6e2f to
198a8dc
Compare
|
/retest |
1 similar comment
|
/retest |
2779204 to
2c87924
Compare
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
2c87924 to
0720b5d
Compare
Signed-off-by: lhy1024 <admin@liudos.us>
0720b5d to
64d7903
Compare
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
| } | ||
|
|
||
| // AffinityMicroserviceRedirector only forwards affinity GET requests to the scheduling service. | ||
| func AffinityMicroserviceRedirector() gin.HandlerFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not in server/api/server.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a middleware for v2
Signed-off-by: lhy1024 <admin@liudos.us>
|
/retest |
pkg/schedule/affinity/manager.go
Outdated
| labelRule = m.labelRuleBuffer[group.ID] | ||
| } | ||
| // Once group created, the buffer must be deleted. | ||
| delete(m.labelRuleBuffer, group.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to delete after updateGroupLabelRuleLockedWithCount?
| return | ||
| } | ||
| // Attach only non-empty label rules to avoid marking the group as having ranges when it does not. | ||
| if len(gkr.KeyRanges) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If len(keyRanges) is a default limit, can we put it inside updateGroupLabelRuleLockedWithCount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the check inside would break the other call.
| // Store LabelRule information in the buffer when it is synchronized before the group. | ||
| m.labelRuleBuffer[groupID] = labelRule | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about returning direct when !exist && len(gkr.KeyRanges) ==0?
| _, err = client.Put(ctx, labelKey, string(labelValue)) | ||
| re.NoError(err) | ||
|
|
||
| time.Sleep(100 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to use Eventually
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughThis pull request introduces an etcd-based affinity group watcher for the scheduling service, exposes affinity groups via REST API endpoints, enhances the manager to buffer label rules arriving before their corresponding groups, adds a check-less variant for label rule parsing, and refactors microservice redirect middleware to use a unified matching function and new RedirectRule type. Changes
Sequence Diagram(s)sequenceDiagram
participant etcd as etcd
participant Watcher as Affinity Watcher
participant Manager as Affinity Manager
participant Labeler as Region Labeler
note over etcd,Watcher: etcd emits group / label-rule events
etcd->>Watcher: Group/Rule Change Event
activate Watcher
alt Affinity Group Update
Watcher->>Watcher: Unmarshal JSON → affinity.Group
Watcher->>Manager: SyncGroupFromEtcd(groupID, group)
activate Manager
Manager->>Manager: Check labelRuleBuffer
Manager->>Labeler: UpdateGroup (keyRanges)
deactivate Manager
else Affinity Group Delete
Watcher->>Manager: SyncGroupDeleteFromEtcd(groupID)
activate Manager
Manager->>Manager: Remove group, clear buffers
deactivate Manager
end
alt Label Rule Update
Watcher->>Watcher: Filter "affinity_group/" prefix
Watcher->>Watcher: Parse JSON → labeler.LabelRule
Watcher->>Manager: SyncKeyRangesFromEtcd(ruleID, rule)
activate Manager
alt Group exists
Manager->>Manager: Attach rule, update keyRanges
else Group missing
Manager->>Manager: Buffer rule in labelRuleBuffer
end
deactivate Manager
else Label Rule Delete
Watcher->>Manager: SyncKeyRangesDeleteFromEtcd(ruleID)
activate Manager
Manager->>Manager: Remove keyRanges, clear buffer
deactivate Manager
end
deactivate Watcher
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🪛 golangci-lint (2.5.0)pkg/mcs/scheduling/server/affinity/watcher_test.go[error] 36-36: : # github.com/tikv/pd/pkg/dashboard/uiserver (typecheck) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
🔇 Additional comments (8)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
pkg/utils/apiutil/serverapi/middleware_test.go (1)
57-78: Comprehensive edge-case coverage, consider adding a positive test.The test suite thoroughly covers failure scenarios:
- Forbidden header blocks redirect
- Keyspace group disabled blocks redirect
- Non-independent service blocks redirect
- Filter function blocks redirect
Consider adding a test for the successful redirect path where all conditions are met and a valid primary address is returned.
✅ Optional: Add positive test case
+func TestMatchMicroserviceRedirectSuccess(t *testing.T) { + re := require.New(t) + req, err := http.NewRequest(http.MethodGet, "/pd/api/v1/foo", http.NoBody) + re.NoError(err) + + matched, addr := MatchMicroserviceRedirect( + req, + []RedirectRule{{ + MatchPath: "/pd/api/v1/foo", + TargetPath: "/scheduling/api/v1/bar", + TargetServiceName: constant.SchedulingServiceName, + MatchMethods: []string{http.MethodGet}, + }}, + true, + func(string) bool { return true }, + func(_ context.Context, _ string) (string, bool) { return "http://scheduling:3379", true }, + ) + + re.True(matched) + re.Equal("http://scheduling:3379", addr) + re.Equal("/scheduling/api/v1/bar", req.URL.Path) +}Also applies to: 80-100, 102-122, 124-147
pkg/mcs/scheduling/server/affinity/watcher_test.go (1)
237-242: Consider usingfmt.Sprintffor clearer hex string generation.The current approach using
string(rune('0'+idx))works but is less readable. Usingfmt.Sprintfwould make the intent clearer.♻️ Suggested improvement
- labelRule := makeTestLabelRule(groups[idx].ID, - "7480000000000000ff"+string(rune('0'+idx))+"000000000000000f8", - "7480000000000000ff"+string(rune('0'+idx))+"100000000000000f8", - ) + labelRule := makeTestLabelRule(groups[idx].ID, + fmt.Sprintf("7480000000000000ff%d000000000000000f8", idx), + fmt.Sprintf("7480000000000000ff%d100000000000000f8", idx), + )pkg/mcs/scheduling/server/affinity/watcher.go (1)
97-103: Consider validating the extracted group ID in deleteFn.If
TrimPrefixdoesn't find the prefix, it returns the original key unchanged, which could lead to unexpected behavior. A validation check would be safer.♻️ Suggested improvement
deleteFn := func(kv *mvccpb.KeyValue) error { key := string(kv.Key) log.Info("delete affinity group", zap.String("key", key)) - groupID := strings.TrimPrefix(key, keypath.AffinityGroupsPrefix()) + prefix := keypath.AffinityGroupsPrefix() + if !strings.HasPrefix(key, prefix) { + log.Warn("unexpected key format in delete event", zap.String("key", key)) + return nil + } + groupID := strings.TrimPrefix(key, prefix) w.affinityManager.SyncGroupDeleteFromEtcd(groupID) return nil }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
pkg/cluster/cluster.gopkg/mcs/scheduling/server/affinity/watcher.gopkg/mcs/scheduling/server/affinity/watcher_test.gopkg/mcs/scheduling/server/apis/v1/api.gopkg/mcs/scheduling/server/rule/watcher.gopkg/mcs/scheduling/server/server.gopkg/schedule/affinity/manager.gopkg/schedule/affinity/manager_test.gopkg/schedule/labeler/labeler.gopkg/schedule/labeler/rules.gopkg/utils/apiutil/serverapi/middleware.gopkg/utils/apiutil/serverapi/middleware_test.goserver/apiv2/middlewares/microservice_redirector.goserver/apiv2/router.gotests/server/apiv2/handlers/affinity_test.go
💤 Files with no reviewable changes (2)
- tests/server/apiv2/handlers/affinity_test.go
- pkg/cluster/cluster.go
🧰 Additional context used
🧬 Code graph analysis (8)
pkg/schedule/labeler/labeler.go (1)
pkg/schedule/labeler/rules.go (1)
NewLabelRuleFromJSONWithoutCheck(103-109)
pkg/utils/apiutil/serverapi/middleware_test.go (2)
pkg/mcs/utils/constant/constant.go (2)
TSOServiceName(51-51)SchedulingServiceName(55-55)pkg/utils/apiutil/apiutil.go (1)
XForbiddenForwardToMicroserviceHeader(71-71)
pkg/mcs/scheduling/server/affinity/watcher.go (4)
pkg/utils/etcdutil/etcdutil.go (2)
LoopWatcher(346-387)NewLoopWatcher(390-419)pkg/utils/keypath/affinity.go (1)
AffinityGroupsPrefix(30-34)pkg/schedule/labeler/rules.go (1)
NewLabelRuleFromJSON(91-100)pkg/utils/keypath/rule.go (2)
RegionLabelPathPrefix(52-54)RegionLabelKeyPath(42-44)
server/apiv2/router.go (1)
server/apiv2/middlewares/microservice_redirector.go (1)
AffinityMicroserviceRedirector(67-77)
pkg/utils/apiutil/serverapi/middleware.go (2)
pkg/mcs/utils/constant/constant.go (1)
SchedulingServiceName(55-55)client/servicediscovery/tso_service_discovery.go (1)
Contains(772-779)
pkg/schedule/affinity/manager.go (3)
pkg/schedule/affinity/group.go (1)
GroupKeyRanges(308-311)pkg/schedule/labeler/rules.go (1)
LabelRule(50-57)pkg/utils/keyutil/keyrange.go (1)
KeyRanges(95-97)
pkg/mcs/scheduling/server/rule/watcher.go (1)
pkg/schedule/labeler/rules.go (1)
NewLabelRuleFromJSONWithoutCheck(103-109)
pkg/mcs/scheduling/server/affinity/watcher_test.go (4)
pkg/utils/keypath/rule.go (1)
RegionLabelKeyPath(42-44)pkg/utils/keypath/affinity.go (1)
AffinityGroupPath(23-25)pkg/mcs/scheduling/server/affinity/watcher.go (2)
Watcher(45-56)NewWatcher(59-82)pkg/schedule/affinity/txn.go (1)
LabelRuleIDPrefix(41-41)
🪛 golangci-lint (2.5.0)
pkg/utils/apiutil/serverapi/middleware_test.go
[error] 36-36: : # github.com/tikv/pd/pkg/dashboard/uiserver
pkg/dashboard/uiserver/embedded_assets_rewriter.go:36:26: undefined: assets
pkg/dashboard/uiserver/embedded_assets_rewriter.go:37:13: undefined: vfsgen۰FS
pkg/dashboard/uiserver/embedded_assets_rewriter.go:39:15: undefined: vfsgen۰CompressedFileInfo
pkg/dashboard/uiserver/embedded_assets_rewriter.go:47:9: undefined: assets
(typecheck)
pkg/mcs/scheduling/server/affinity/watcher.go
[error] 36-36: : # github.com/tikv/pd/pkg/dashboard/uiserver
pkg/dashboard/uiserver/embedded_assets_rewriter.go:36:26: undefined: assets
pkg/dashboard/uiserver/embedded_assets_rewriter.go:37:13: undefined: vfsgen۰FS
pkg/dashboard/uiserver/embedded_assets_rewriter.go:39:15: undefined: vfsgen۰CompressedFileInfo
pkg/dashboard/uiserver/embedded_assets_rewriter.go:47:9: undefined: assets
(typecheck)
server/apiv2/middlewares/microservice_redirector.go
[error] 36-36: : # github.com/tikv/pd/pkg/dashboard/uiserver
pkg/dashboard/uiserver/embedded_assets_rewriter.go:36:26: undefined: assets
pkg/dashboard/uiserver/embedded_assets_rewriter.go:37:13: undefined: vfsgen۰FS
pkg/dashboard/uiserver/embedded_assets_rewriter.go:39:15: undefined: vfsgen۰CompressedFileInfo
pkg/dashboard/uiserver/embedded_assets_rewriter.go:47:9: undefined: assets
(typecheck)
pkg/mcs/scheduling/server/affinity/watcher_test.go
[error] 36-36: : # github.com/tikv/pd/pkg/dashboard/uiserver
pkg/dashboard/uiserver/embedded_assets_rewriter.go:36:26: undefined: assets
pkg/dashboard/uiserver/embedded_assets_rewriter.go:37:13: undefined: vfsgen۰FS
pkg/dashboard/uiserver/embedded_assets_rewriter.go:39:15: undefined: vfsgen۰CompressedFileInfo
pkg/dashboard/uiserver/embedded_assets_rewriter.go:47:9: undefined: assets
(typecheck)
🔇 Additional comments (30)
server/apiv2/router.go (1)
62-62: LGTM! Affinity redirect middleware placement is correct.The middleware is appropriately positioned after the general
Redirector()and beforeRegisterAffinity(), ensuring affinity requests are intercepted and forwarded to the Scheduling service when needed.pkg/utils/apiutil/serverapi/middleware_test.go (2)
30-32: LGTM! Proper test setup with leak detection.The TestMain setup correctly configures goleak to detect goroutine leaks during tests.
34-55: LGTM! Correctly tests no-primary-address edge case.The test validates that when a rule matches but no primary address is available, the function returns
matched=truewith an empty address, allowing the caller to returnErrRedirectas expected.server/apiv2/middlewares/microservice_redirector.go (2)
32-64: LGTM! Solid middleware implementation with proper error handling.The middleware correctly:
- Validates redirect rules using
MatchMicroserviceRedirect- Returns
ErrRedirectwhen a rule matches but no primary address is available (line 50)- Handles URL parsing errors appropriately (line 57)
- Marks forwarded requests with the
XForwardedToMicroserviceHeader(line 60)- Uses reverse proxy for transparent forwarding (line 61)
- Aborts middleware chain after redirect (line 62)
66-77: LGTM! Clean affinity-specific wrapper.The function provides a focused redirect rule for affinity groups, forwarding GET requests from the PD API v2 path to the Scheduling service's affinity endpoint. The implementation correctly leverages the generic
MicroserviceRedirector.pkg/utils/apiutil/serverapi/middleware.go (3)
81-88: LGTM! Well-designed public API for redirect rules.The
RedirectRulestruct provides a clean, extensible interface with:
- Clear field names for path matching and rewriting
- Method filtering capability
- Optional
Filterfunction for custom logicThis design enables flexible redirect configuration across the codebase.
128-188: LGTM! Robust redirect matching with proper path handling.The function correctly implements:
- Early validation of keyspace group and rules (lines 137-139)
- Header-based opt-out mechanism (lines 140-142)
- Service independence checks (lines 148-150)
- Path and method matching (line 151)
- Optional filter invocation (lines 154-156)
- Primary address resolution with proper error handling (lines 159-164)
- URL path parameter extraction and rewriting (lines 166-182)
The path handling logic appropriately uses
RawPathwhen present to preserve URL encoding, and cleanly extracts parameters for path rewriting.
103-117: LGTM! Constructor updated to use public RedirectRule type.The
MicroserviceRedirectRuleoption builder correctly constructs the newRedirectRulestruct while maintaining backward compatibility with the existing API.pkg/schedule/labeler/labeler.go (1)
112-141: LGTM: Deferred validation pattern correctly implemented.The change to
NewLabelRuleFromJSONWithoutCheckfollowed by immediatecheckAndAdjust()maintains existing validation behavior while enabling other code paths to defer validation when needed.pkg/mcs/scheduling/server/rule/watcher.go (1)
219-226: LGTM: Validation properly deferred to SetLabelRuleLocked.The rule is parsed without validation, then validated and persisted via
SetLabelRuleLocked(which callscheckAndAdjustinternally). This ensures rules are validated before use.pkg/schedule/labeler/rules.go (1)
90-109: LGTM: Clean separation of parsing and validation.The refactored
NewLabelRuleFromJSONmaintains backward compatibility by immediately validating, while the newNewLabelRuleFromJSONWithoutCheckenables deferred validation for use cases like buffering rules before groups exist. Function names clearly communicate the difference.pkg/schedule/affinity/manager_test.go (1)
178-217: LGTM: Test correctly validates out-of-order synchronization.The test verifies that label rules arriving before their corresponding groups are properly buffered and applied after group creation. The JSON roundtrip (lines 205-208) correctly simulates the etcd watcher path to ensure type consistency.
pkg/mcs/scheduling/server/server.go (2)
530-534: LGTM: Affinity watcher properly initialized.The watcher is correctly initialized after cluster creation with proper dependency injection and error handling.
560-573: LGTM: Proper watcher cleanup with nil checks.The affinity watcher cleanup follows the same defensive pattern as other watchers, preventing nil pointer dereferences during shutdown.
pkg/mcs/scheduling/server/apis/v1/api.go (5)
89-92: LGTM - Response type is well-defined.The
AffinityGroupsResponsestruct properly encapsulates the affinity groups map with appropriate JSON tags. Usingmap[string]*affinity.GroupStatealigns with how the data is consumed by the handler.
267-274: LGTM - Affinity router follows established patterns.The router registration is consistent with other routers in the file, properly applying the service redirector middleware and using appropriate RESTful routes.
1617-1634: LGTM - Proper validation with appropriate error responses.The helper function correctly validates server state, cluster availability, and affinity manager presence before proceeding. The use of
AbortWithStatusJSONensures the request is terminated after sending the error response.
1642-1655: LGTM - Handler correctly builds the response.The handler properly retrieves all affinity group states and constructs a map keyed by group ID for the JSON response.
1665-1684: LGTM - Comprehensive error handling.The handler properly maps domain errors to appropriate HTTP status codes (400 for invalid ID, 404 for not found, 500 for internal errors). The error type checking pattern using
Equal()is consistent with the codebase conventions.pkg/mcs/scheduling/server/affinity/watcher_test.go (2)
308-328: LGTM - Test setup and cleanup are properly implemented.The
preparefunction correctly sets up an embedded etcd instance for testing, and the cleanup function properly releases resources in the correct order (cancel context, close etcd, close client, remove directory).
45-114: LGTM - Comprehensive lifecycle test.This test thoroughly covers the complete lifecycle of affinity groups and label rules, including creation, updates, and deletion. The use of
testutil.Eventuallyensures proper handling of asynchronous etcd operations.pkg/schedule/affinity/manager.go (5)
76-80: LGTM - Buffer fields are well-documented.The new
labelRuleBufferfield with its explanatory comment clearly describes its purpose for handling out-of-order synchronization between label rules and groups during etcd synchronization.
467-473: LGTM - Buffer fallback logic is correct.The buffer handling correctly falls back to
labelRuleBufferwhen the region labeler hasn't synchronized yet. Since thelabelRulevalue is read before deletion, the subsequent operations can proceed safely.
557-564: LGTM - Buffering logic handles out-of-order synchronization correctly.The implementation correctly buffers non-empty label rules that arrive before their corresponding group is created. When the group exists, the buffer is properly cleared to avoid stale data.
522-524: LGTM - Proper cleanup of both caches on group deletion.The deletion correctly cleans up both
keyRangesandlabelRuleBufferentries, ensuring no orphaned data remains after a group is removed.
566-571: LGTM - Cache synchronization handles both add and remove cases.The logic properly maintains the
keyRangescache by storing non-empty ranges and explicitly removing entries when ranges become empty, preventing stale cache entries.pkg/mcs/scheduling/server/affinity/watcher.go (4)
58-82: LGTM - Constructor with proper cleanup on initialization failure.The
NewWatcherconstructor correctly callsClose()to clean up resources if either watcher initialization fails, preventing resource leaks.
144-156: LGTM - Label watcher correctly filters to affinity rules.The watcher efficiently filters etcd events by using the
LabelRuleIDPrefixin the watch path, ensuring only affinity-related label rules are processed. This aligns with the test caseTestOnlyProcessAffinityGroupLabelRules.
158-162: LGTM - Clean shutdown implementation.The
Closemethod properly cancels the context first and then waits for all goroutines to finish. Bothcancel()andwg.Wait()are safe to call multiple times, making this method idempotent.
35-56: LGTM - Well-documented Watcher struct.The struct documentation clearly explains the two watched paths, their key formats, and value types. This makes the code self-documenting and easier to maintain.
Signed-off-by: lhy1024 <admin@liudos.us>
What problem does this PR solve?
Issue Number: Ref #9764
What is changed and how does it work?
Check List
Tests
Release note
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.