Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustetresolver: Copy endpoints.Addresses slice from DNS updates to avoid data races #7991

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Jan 8, 2025

Fixes: #7982

Background

clusterresolver uses the lastUpate() method in the resourceResolver interface that has implementations for both EDS and DNS.

// lastUpdate returns endpoint results from the most recent resolution.
//
// The type of the first return result is dependent on the resolver
// implementation.
//
// The second return result indicates whether the resolver was able to
// successfully resolve the resource name to endpoints. If set to false, the
// first return result is invalid and must not be used.
lastUpdate() (any, bool)

When the child balancer's state needs to be updated, clusterresolver calls lastUpdate on DNS and EDS. If there were no changes since the last time lastUpdate was called, the child resolver will return the same update object. In case of EDS, the update contains a list of xDS endpoints, while for DNS, since #7858 the update contains a list of resolver.Endpoint grouped by locality.

In case of EDS, when the xDS endpoints are converted to resolver.Endpoints, new Endpoint objects are allocated that don't share any memory with the previous update sent to the child balancer.

cfg, endpoints, err := priorityLocalitiesToClusterImpl(priorityLocalities, pName, mechanism, drops, xdsLBPolicy)

However, in case of DNS, the endpoint objects are copied by iterating on the original slice returned by DNS. As Endpoint.Addresses is a slice field, both the original and the copy share the same slice. This is the cause of the race.

func buildClusterImplConfigForDNS(g *nameGenerator, endpoints []resolver.Endpoint, mechanism DiscoveryMechanism) (string, *clusterimpl.LBConfig, []resolver.Endpoint) {
// Endpoint picking policy for DNS is hardcoded to pick_first.
const childPolicy = "pick_first"
retEndpoints := make([]resolver.Endpoint, len(endpoints))
pName := fmt.Sprintf("priority-%v", g.prefix)
for i, e := range endpoints {
retEndpoints[i] = hierarchy.SetInEndpoint(e, []string{pName})
}
return pName, &clusterimpl.LBConfig{
Cluster: mechanism.Cluster,
TelemetryLabels: mechanism.TelemetryLabels,
ChildPolicy: &internalserviceconfig.BalancerConfig{Name: childPolicy},
}, retEndpoints
}

If the child tries to access endpoints.Addresses while clusterresolver is creating the endpoints list for the next update, a race occurs.

Tested

Reproduced the failure on forge, it failed within 1000 runs. Verified the test no longer races in 10000 runs with the fix.

RELEASE NOTES: N/A

@arjan-bal arjan-bal added this to the 1.70 Release milestone Jan 8, 2025
@arjan-bal arjan-bal requested a review from easwars January 8, 2025 20:37
Comment on lines 252 to 253
// Send a copy of the endpoints slice to the children as balancer may use
// the same endpoints slice in subsequent updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more about this? It seems unexpected to me that endpoints would ever be mutated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the loop that follows, BalancerAttributes are added to addresses stored in each endpoint. There is a comment explaining that the locality information stored in BalancerAttributes is used for load reporting. The race in the test happens when the BalancerAttributes are being set and the leaft pickfirst reads the address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still not understanding this fully. Why is the clusterresolver mutating the endpoints slice between updates instead of creating a new slice and replacing the old one?

That part seems wrong to me, but I'm probably missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clusterresolver uses the lastUpate() method in the resourceResolver interface that has implementations for both EDS and DNS.

// lastUpdate returns endpoint results from the most recent resolution.
//
// The type of the first return result is dependent on the resolver
// implementation.
//
// The second return result indicates whether the resolver was able to
// successfully resolve the resource name to endpoints. If set to false, the
// first return result is invalid and must not be used.
lastUpdate() (any, bool)

When the child balancer's state needs to be updated, clusterresolver calls lastUpdate on DNS and EDS. If there were no changes since the last time lastUpdate was called, the child resolver will return the same update object. In case of EDS, the update contains a list of xDS endpoints, while for DNS the update contains a list of resolver.Endpoint grouped by locality.

In case of EDS, when the xDS endpoints are converted to resolver.Endpoints, new Endpoint objects are allocated that don't share any memory with the previous update sent to the child balancer.

cfg, endpoints, err := priorityLocalitiesToClusterImpl(priorityLocalities, pName, mechanism, drops, xdsLBPolicy)

However, in case of DNS, the endpoint objects are copied by iterating on the original slice returned by DNS. As Endpoint.Addresses is a slice field, both the original and the copy share the same slice. This is the cause of the race.

func buildClusterImplConfigForDNS(g *nameGenerator, endpoints []resolver.Endpoint, mechanism DiscoveryMechanism) (string, *clusterimpl.LBConfig, []resolver.Endpoint) {
// Endpoint picking policy for DNS is hardcoded to pick_first.
const childPolicy = "pick_first"
retEndpoints := make([]resolver.Endpoint, len(endpoints))
pName := fmt.Sprintf("priority-%v", g.prefix)
for i, e := range endpoints {
retEndpoints[i] = hierarchy.SetInEndpoint(e, []string{pName})
}
return pName, &clusterimpl.LBConfig{
Cluster: mechanism.Cluster,
TelemetryLabels: mechanism.TelemetryLabels,
ChildPolicy: &internalserviceconfig.BalancerConfig{Name: childPolicy},
}, retEndpoints
}

I've change the PR to copy the Endpoint.Addresses field only for DNS which is the cause of the race.

@dfawley dfawley assigned arjan-bal and unassigned easwars Jan 9, 2025
@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Jan 9, 2025
@dfawley dfawley assigned arjan-bal and unassigned dfawley Jan 9, 2025
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.18%. Comparing base (6f41085) to head (0e98fcb).
Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7991   +/-   ##
=======================================
  Coverage   82.17%   82.18%           
=======================================
  Files         381      381           
  Lines       38539    38542    +3     
=======================================
+ Hits        31668    31674    +6     
+ Misses       5564     5562    -2     
+ Partials     1307     1306    -1     
Files with missing lines Coverage Δ
...internal/balancer/clusterresolver/configbuilder.go 91.07% <100.00%> (+0.16%) ⬆️

... and 25 files with indirect coverage changes

@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Jan 10, 2025
@arjan-bal arjan-bal changed the title clustetresolver: Send a copy of endpoints slice to children to avoid data races clustetresolver: Copy endpoints.Addresses slice from DNS updates to avoid data races Jan 10, 2025
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think I like this fix better, but it's not clear to me which one to prefer. Since it's all internal code, it doesn't really matter.

@dfawley dfawley assigned arjan-bal and unassigned dfawley Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: Test/AggregateCluster_BadEDSFromError_GoodToBadDNS
3 participants