clustetresolver: Copy endpoints.Addresses slice from DNS updates to avoid data races #7991

arjan-bal · 2025-01-08T20:37:18Z

Background

clusterresolver uses the lastUpate() method in the resourceResolver interface that has implementations for both EDS and DNS.

grpc-go/xds/internal/balancer/clusterresolver/resource_resolver.go

Lines 60 to 68 in 2d4daf3

    
           // lastUpdate returns endpoint results from the most recent resolution. 
        
           // 
        
           // The type of the first return result is dependent on the resolver 
        
           // implementation. 
        
           // 
        
           // The second return result indicates whether the resolver was able to 
        
           // successfully resolve the resource name to endpoints. If set to false, the 
        
           // first return result is invalid and must not be used. 
        
           lastUpdate() (any, bool)

When the child balancer's state needs to be updated, clusterresolver calls lastUpdate on DNS and EDS. If there were no changes since the last time lastUpdate was called, the child resolver will return the same update object. In case of EDS, the update contains a list of xDS endpoints, while for DNS, since #7858 the update contains a list of resolver.Endpoint grouped by locality.

In case of EDS, when the xDS endpoints are converted to resolver.Endpoints, new Endpoint objects are allocated that don't share any memory with the previous update sent to the child balancer.

grpc-go/xds/internal/balancer/clusterresolver/configbuilder.go

Line 189 in 2d4daf3

    
           cfg, endpoints, err := priorityLocalitiesToClusterImpl(priorityLocalities, pName, mechanism, drops, xdsLBPolicy)

However, in case of DNS, the endpoint objects are copied by iterating on the original slice returned by DNS. As Endpoint.Addresses is a slice field, both the original and the copy share the same slice. This is the cause of the race.

grpc-go/xds/internal/balancer/clusterresolver/configbuilder.go

Lines 140 to 153 in 2d4daf3

    
           func buildClusterImplConfigForDNS(g *nameGenerator, endpoints []resolver.Endpoint, mechanism DiscoveryMechanism) (string, *clusterimpl.LBConfig, []resolver.Endpoint) { 
        
           	// Endpoint picking policy for DNS is hardcoded to pick_first. 
        
           	const childPolicy = "pick_first" 
        
           	retEndpoints := make([]resolver.Endpoint, len(endpoints)) 
        
           	pName := fmt.Sprintf("priority-%v", g.prefix) 
        
           	for i, e := range endpoints { 
        
           		retEndpoints[i] = hierarchy.SetInEndpoint(e, []string{pName}) 
        
           	} 
        
           	return pName, &clusterimpl.LBConfig{ 
        
           		Cluster:         mechanism.Cluster, 
        
           		TelemetryLabels: mechanism.TelemetryLabels, 
        
           		ChildPolicy:     &internalserviceconfig.BalancerConfig{Name: childPolicy}, 
        
           	}, retEndpoints 
        
           }

If the child tries to access endpoints.Addresses while clusterresolver is creating the endpoints list for the next update, a race occurs.

Tested

Reproduced the failure on forge, it failed within 1000 runs. Verified the test no longer races in 10000 runs with the fix.

RELEASE NOTES: N/A

dfawley · 2025-01-09T18:10:46Z

xds/internal/balancer/clusterresolver/clusterresolver.go

+	// Send a copy of the endpoints slice to the children as balancer may use
+	// the same endpoints slice in subsequent updates.


Can you explain more about this? It seems unexpected to me that endpoints would ever be mutated.

In the loop that follows, BalancerAttributes are added to addresses stored in each endpoint. There is a comment explaining that the locality information stored in BalancerAttributes is used for load reporting. The race in the test happens when the BalancerAttributes are being set and the leaft pickfirst reads the address.

Sorry, I'm still not understanding this fully. Why is the clusterresolver mutating the endpoints slice between updates instead of creating a new slice and replacing the old one?

That part seems wrong to me, but I'm probably missing something.

clusterresolver uses the lastUpate() method in the resourceResolver interface that has implementations for both EDS and DNS.

grpc-go/xds/internal/balancer/clusterresolver/resource_resolver.go

Lines 60 to 68 in 2d4daf3

// lastUpdate returns endpoint results from the most recent resolution.

//

// The type of the first return result is dependent on the resolver

// implementation.

//

// The second return result indicates whether the resolver was able to

// successfully resolve the resource name to endpoints. If set to false, the

// first return result is invalid and must not be used.

lastUpdate() (any, bool)

When the child balancer's state needs to be updated, clusterresolver calls lastUpdate on DNS and EDS. If there were no changes since the last time lastUpdate was called, the child resolver will return the same update object. In case of EDS, the update contains a list of xDS endpoints, while for DNS the update contains a list of resolver.Endpoint grouped by locality.

In case of EDS, when the xDS endpoints are converted to resolver.Endpoints, new Endpoint objects are allocated that don't share any memory with the previous update sent to the child balancer.

grpc-go/xds/internal/balancer/clusterresolver/configbuilder.go

Line 189 in 2d4daf3

cfg, endpoints, err := priorityLocalitiesToClusterImpl(priorityLocalities, pName, mechanism, drops, xdsLBPolicy)

However, in case of DNS, the endpoint objects are copied by iterating on the original slice returned by DNS. As Endpoint.Addresses is a slice field, both the original and the copy share the same slice. This is the cause of the race.

grpc-go/xds/internal/balancer/clusterresolver/configbuilder.go

Lines 140 to 153 in 2d4daf3

func buildClusterImplConfigForDNS(g *nameGenerator, endpoints []resolver.Endpoint, mechanism DiscoveryMechanism) (string, *clusterimpl.LBConfig, []resolver.Endpoint) {

// Endpoint picking policy for DNS is hardcoded to pick_first.

const childPolicy = "pick_first"

retEndpoints := make([]resolver.Endpoint, len(endpoints))

pName := fmt.Sprintf("priority-%v", g.prefix)

for i, e := range endpoints {

retEndpoints[i] = hierarchy.SetInEndpoint(e, []string{pName})

}

return pName, &clusterimpl.LBConfig{

Cluster: mechanism.Cluster,

TelemetryLabels: mechanism.TelemetryLabels,

ChildPolicy: &internalserviceconfig.BalancerConfig{Name: childPolicy},

}, retEndpoints

}

I've change the PR to copy the Endpoint.Addresses field only for DNS which is the cause of the race.

This reverts commit 13262ed.

codecov · 2025-01-10T05:47:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.18%. Comparing base (6f41085) to head (0e98fcb).
Report is 4 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #7991   +/-   ##
=======================================
  Coverage   82.17%   82.18%           
=======================================
  Files         381      381           
  Lines       38539    38542    +3     
=======================================
+ Hits        31668    31674    +6     
+ Misses       5564     5562    -2     
+ Partials     1307     1306    -1

Files with missing lines	Coverage Δ
...internal/balancer/clusterresolver/configbuilder.go	`91.07% <100.00%> (+0.16%)`	⬆️

... and 25 files with indirect coverage changes

dfawley

Thanks! I think I like this fix better, but it's not clear to me which one to prefer. Since it's all internal code, it doesn't really matter.

Send copy of endpoints slice to children

13262ed

arjan-bal added the Type: Bug label Jan 8, 2025

arjan-bal added this to the 1.70 Release milestone Jan 8, 2025

arjan-bal requested a review from easwars January 8, 2025 20:37

arjan-bal assigned easwars Jan 8, 2025

dfawley reviewed Jan 9, 2025

View reviewed changes

dfawley assigned arjan-bal and unassigned easwars Jan 9, 2025

arjan-bal assigned dfawley and unassigned arjan-bal Jan 9, 2025

dfawley assigned arjan-bal and unassigned dfawley Jan 9, 2025

arjan-bal added 2 commits January 10, 2025 11:09

Revert "Send copy of endpoints slice to children"

03b9ca5

This reverts commit 13262ed.

Fix only DNS

0e98fcb

arjan-bal assigned dfawley and unassigned arjan-bal Jan 10, 2025

arjan-bal changed the title ~~clustetresolver: Send a copy of endpoints slice to children to avoid data races~~ clustetresolver: Copy endpoints.Addresses slice from DNS updates to avoid data races Jan 10, 2025

dfawley approved these changes Jan 10, 2025

View reviewed changes

dfawley assigned arjan-bal and unassigned dfawley Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clustetresolver: Copy endpoints.Addresses slice from DNS updates to avoid data races #7991

clustetresolver: Copy endpoints.Addresses slice from DNS updates to avoid data races #7991

arjan-bal commented Jan 8, 2025 •

edited

Loading

dfawley Jan 9, 2025

arjan-bal Jan 9, 2025

dfawley Jan 9, 2025

arjan-bal Jan 10, 2025

codecov bot commented Jan 10, 2025

dfawley left a comment

	// lastUpdate returns endpoint results from the most recent resolution.
	//
	// The type of the first return result is dependent on the resolver
	// implementation.
	//
	// The second return result indicates whether the resolver was able to
	// successfully resolve the resource name to endpoints. If set to false, the
	// first return result is invalid and must not be used.
	lastUpdate() (any, bool)

	func buildClusterImplConfigForDNS(g nameGenerator, endpoints []resolver.Endpoint, mechanism DiscoveryMechanism) (string, clusterimpl.LBConfig, []resolver.Endpoint) {
	// Endpoint picking policy for DNS is hardcoded to pick_first.
	const childPolicy = "pick_first"
	retEndpoints := make([]resolver.Endpoint, len(endpoints))
	pName := fmt.Sprintf("priority-%v", g.prefix)
	for i, e := range endpoints {
	retEndpoints[i] = hierarchy.SetInEndpoint(e, []string{pName})
	}
	return pName, &clusterimpl.LBConfig{
	Cluster: mechanism.Cluster,
	TelemetryLabels: mechanism.TelemetryLabels,
	ChildPolicy: &internalserviceconfig.BalancerConfig{Name: childPolicy},
	}, retEndpoints
	}

		// Send a copy of the endpoints slice to the children as balancer may use
		// the same endpoints slice in subsequent updates.

clustetresolver: Copy endpoints.Addresses slice from DNS updates to avoid data races #7991

Are you sure you want to change the base?

clustetresolver: Copy endpoints.Addresses slice from DNS updates to avoid data races #7991

Conversation

arjan-bal commented Jan 8, 2025 • edited Loading

Background

Tested

dfawley Jan 9, 2025

Choose a reason for hiding this comment

arjan-bal Jan 9, 2025

Choose a reason for hiding this comment

dfawley Jan 9, 2025

Choose a reason for hiding this comment

arjan-bal Jan 10, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 10, 2025

Codecov Report

dfawley left a comment

Choose a reason for hiding this comment

arjan-bal commented Jan 8, 2025 •

edited

Loading