Skip to content

Conversation

@ndyakov
Copy link
Member

@ndyakov ndyakov commented Nov 24, 2025

Introduces reload of topology in the cluster maint notifications

@ndyakov ndyakov marked this pull request as ready for review December 2, 2025 13:56
Copilot finished reviewing on behalf of ndyakov December 2, 2025 13:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces lazy cluster topology reload functionality to handle maintenance notifications, specifically addressing issue #3594. The implementation adds a queuing mechanism to prevent excessive cluster state reloads when multiple SMIGRATED notifications arrive in quick succession.

  • Added new connection options: DialerRetries, DialerRetryTimeout, MaxConcurrentDials, and PushNotificationProcessor across all client types
  • Implemented a lazy reload mechanism with cooldown period and pending request collapsing to optimize cluster topology reloads
  • Integrated SMIGRATED notification handling to trigger cluster state reloads when slot migrations occur

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
universal.go Added new connection and notification handling options to UniversalOptions, propagated fields to Cluster(), Failover(), and Simple() methods
sentinel.go Added new connection options to FailoverOptions with field propagation to clientOptions(), sentinelOptions(), and clusterOptions() methods, plus URL parameter parsing
osscluster.go Added connection options and PushNotificationProcessor to ClusterOptions, implemented lazy reload with queuing/cooldown logic, and integrated SMIGRATED notification callback for cluster state management
osscluster_lazy_reload_test.go Added comprehensive test coverage for LazyReload behavior including single reload, concurrent deduplication, pending reload queuing, error handling, and cascading notification scenarios
commands_test.go Improved latency test reliability by increasing threshold and sleep duration to reduce flakiness, changed assertions to be more robust

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1145 to +1160
if opt.MaintNotificationsConfig != nil {
c.nodes.OnNewNode(func(nodeClient *Client) {
manager := nodeClient.GetMaintNotificationsManager()
if manager != nil {
manager.SetClusterStateReloadCallback(func(ctx context.Context, hostPort string, slotRanges []string) {
// Log the migration details for now
if internal.LogLevel.InfoOrAbove() {
internal.Logger.Printf(ctx, "cluster: slots %v migrated to %s, reloading cluster state", slotRanges, hostPort)
}
// Currently we reload the entire cluster state
// In the future, this could be optimized to reload only the specific slots
c.state.LazyReload()
})
}
})
}
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster state reload callback is only registered for new nodes via OnNewNode(). This means that any nodes that already exist when the ClusterClient is created will not have this callback registered, and SMIGRATED notifications from these existing nodes will not trigger cluster state reloads.

Consider also registering the callback for existing nodes, similar to how it's done in maintnotifications/e2e/notiftracker_test.go:

if opt.MaintNotificationsConfig != nil {
	// Register callback for existing nodes
	ctx := context.Background()
	_ = c.ForEachShard(ctx, func(ctx context.Context, nodeClient *Client) error {
		manager := nodeClient.GetMaintNotificationsManager()
		if manager != nil {
			manager.SetClusterStateReloadCallback(func(ctx context.Context, hostPort string, slotRanges []string) {
				// ... callback implementation
			})
		}
		return nil
	})
	
	// Register callback for new nodes
	c.nodes.OnNewNode(func(nodeClient *Client) {
		// ... existing implementation
	})
}

Copilot uses AI. Check for mistakes.
ndyakov and others added 2 commits December 2, 2025 16:00
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants