-
Notifications
You must be signed in to change notification settings - Fork 2.5k
lazy cluster topology reload #3614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ndyakov/feature/CAE-1313-maint-cluster
Are you sure you want to change the base?
lazy cluster topology reload #3614
Conversation
…E-1626-cluster-topology-update
…E-1626-cluster-topology-update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces lazy cluster topology reload functionality to handle maintenance notifications, specifically addressing issue #3594. The implementation adds a queuing mechanism to prevent excessive cluster state reloads when multiple SMIGRATED notifications arrive in quick succession.
- Added new connection options:
DialerRetries,DialerRetryTimeout,MaxConcurrentDials, andPushNotificationProcessoracross all client types - Implemented a lazy reload mechanism with cooldown period and pending request collapsing to optimize cluster topology reloads
- Integrated SMIGRATED notification handling to trigger cluster state reloads when slot migrations occur
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| universal.go | Added new connection and notification handling options to UniversalOptions, propagated fields to Cluster(), Failover(), and Simple() methods |
| sentinel.go | Added new connection options to FailoverOptions with field propagation to clientOptions(), sentinelOptions(), and clusterOptions() methods, plus URL parameter parsing |
| osscluster.go | Added connection options and PushNotificationProcessor to ClusterOptions, implemented lazy reload with queuing/cooldown logic, and integrated SMIGRATED notification callback for cluster state management |
| osscluster_lazy_reload_test.go | Added comprehensive test coverage for LazyReload behavior including single reload, concurrent deduplication, pending reload queuing, error handling, and cascading notification scenarios |
| commands_test.go | Improved latency test reliability by increasing threshold and sleep duration to reduce flakiness, changed assertions to be more robust |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if opt.MaintNotificationsConfig != nil { | ||
| c.nodes.OnNewNode(func(nodeClient *Client) { | ||
| manager := nodeClient.GetMaintNotificationsManager() | ||
| if manager != nil { | ||
| manager.SetClusterStateReloadCallback(func(ctx context.Context, hostPort string, slotRanges []string) { | ||
| // Log the migration details for now | ||
| if internal.LogLevel.InfoOrAbove() { | ||
| internal.Logger.Printf(ctx, "cluster: slots %v migrated to %s, reloading cluster state", slotRanges, hostPort) | ||
| } | ||
| // Currently we reload the entire cluster state | ||
| // In the future, this could be optimized to reload only the specific slots | ||
| c.state.LazyReload() | ||
| }) | ||
| } | ||
| }) | ||
| } |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster state reload callback is only registered for new nodes via OnNewNode(). This means that any nodes that already exist when the ClusterClient is created will not have this callback registered, and SMIGRATED notifications from these existing nodes will not trigger cluster state reloads.
Consider also registering the callback for existing nodes, similar to how it's done in maintnotifications/e2e/notiftracker_test.go:
if opt.MaintNotificationsConfig != nil {
// Register callback for existing nodes
ctx := context.Background()
_ = c.ForEachShard(ctx, func(ctx context.Context, nodeClient *Client) error {
manager := nodeClient.GetMaintNotificationsManager()
if manager != nil {
manager.SetClusterStateReloadCallback(func(ctx context.Context, hostPort string, slotRanges []string) {
// ... callback implementation
})
}
return nil
})
// Register callback for new nodes
c.nodes.OnNewNode(func(nodeClient *Client) {
// ... existing implementation
})
}Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Introduces reload of topology in the cluster maint notifications