Skip to content

Conversation

@eshitachandwani
Copy link
Member

This is part of A74 implementation which add CDS/EDS/DNS watchers to the dependency manager. It also adds a temporary flag that is disabled by default so that it is not used in the current RPC paths , but enabled in the dependency manager tests.

RELEASE NOTES: None

@eshitachandwani eshitachandwani added this to the 1.78 Release milestone Dec 4, 2025
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 81.55620% with 64 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.35%. Comparing base (6ed8acb) to head (09842ae).

Files with missing lines Patch % Lines
internal/xds/xdsdepmgr/xds_dependency_manager.go 85.09% 31 Missing and 14 partials ⚠️
internal/xds/xdsdepmgr/watch_service.go 54.76% 16 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8744      +/-   ##
==========================================
- Coverage   83.39%   83.35%   -0.05%     
==========================================
  Files         419      419              
  Lines       32566    32861     +295     
==========================================
+ Hits        27159    27390     +231     
- Misses       4023     4067      +44     
- Partials     1384     1404      +20     
Files with missing lines Coverage Δ
internal/xds/resolver/xds_resolver.go 88.76% <100.00%> (+0.12%) ⬆️
internal/xds/xdsdepmgr/watch_service.go 54.76% <54.76%> (-30.96%) ⬇️
internal/xds/xdsdepmgr/xds_dependency_manager.go 84.15% <85.09%> (-0.75%) ⬇️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@easwars
Copy link
Contributor

easwars commented Dec 11, 2025

Can you also please check the codecov report here: #8744 (comment) to see if there are any valid lines that are missing test coverage. Thanks.

}

func (dr *dnsResolver) ParseServiceConfig(string) *serviceconfig.ParseResult {
return &serviceconfig.ParseResult{Err: fmt.Errorf("service config not supported")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is existing behavior, but do you happen to know if this is what other languages do as well?


// dnsResolverState watches updates for the given DNS hostname. It implements
// resolver.ClientConn interface to work with the DNS resolver.
type dnsResolver struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make an interface that both the endpointsWatcher and the dnsWatcher can satisfy, so that the dependency manager can treat them as equals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if that will add any value , because in the populateCusterConfig we will anyway have to treat is separately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they have a common interface, why would we have to treat it separately?

cmpopts.IgnoreFields(xdsresource.RouteConfigUpdate{}, "Raw"),
cmpopts.IgnoreFields(xdsresource.ClusterUpdate{}, "Raw", "LBPolicy", "TelemetryLabels"),
cmpopts.IgnoreFields(xdsresource.EndpointsUpdate{}, "Raw"),
cmp.Transformer("ErrorsToString", func(in error) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this transformation meant for cluster errors and endpoint errors? If so, would it make more sense to have a function with the below contents, and have transformers for the specific fields that we are interested in, that invoke the function instead of blindly invoking it on all fields of type error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its meant for cluster errors and endpoint errors , and those are the only two error fields in XDSConfig struct. So I dont think it will apply to anything else. And even if we add any error fields later , we might need to do this transformation since we get wrapped errors from XDSConfig

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without having transformers attached to specific fields, it becomes the responsibility of the reader to figure out which fields are affected by the transformation. Either have a comment saying which fields or affected, or change the transformation to apply to specific fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

Comment on lines 117 to 126
// Safely access the first address string for comparison.
addrA := ""
if len(a.Addresses) > 0 {
addrA = a.Addresses[0].Addr
}

addrB := ""
if len(b.Addresses) > 0 {
addrB = b.Addresses[0].Addr
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a stable hash and computing a hash for every endpoint and determining the order based on that hash?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 204 to 216
{ID: clients.Locality{
Region: "region-1",
Zone: "zone-1",
SubZone: "subzone-1",
},
Endpoints: []xdsresource.Endpoint{
{
Addresses: []string{addr},
HealthStatus: xdsresource.EndpointHealthStatusUnknown,
Weight: 1,
},
},
Weight: 1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be in compliance with this style guidenline: go/go-style/decisions#literal-matching-braces

Please fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem to be fixed yet.

Comment on lines 1252 to 1258
// Drain the updates because management server keeps sending the error
// updates repeatedly causing the update from dependency manager to be
// blocked if we don't drain it.
select {
case <-ctx.Done():
case <-watcher.updateCh:
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this pattern is used in a bunch of tests, could we have a watcher type that can handle this instead of having the main test goroutine handle this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@easwars easwars assigned eshitachandwani and unassigned easwars and arjan-bal Dec 15, 2025
Comment on lines 266 to 267
clusterConfigs[clusterName] = &xdsresource.ClusterResult{Err: m.annotateErrorWithNodeID(fmt.Errorf("aggregate cluster graph exceeds max depth (%d)", aggregateClusterMaxDepth))}
return true, nil, m.annotateErrorWithNodeID(fmt.Errorf("aggregate cluster graph exceeds max depth (%d)", aggregateClusterMaxDepth))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be:

		err := m.annotateErrorWithNodeID(fmt.Errorf("aggregate cluster graph exceeds max depth (%d)", aggregateClusterMaxDepth))}
		clusterConfigs[clusterName] = &xdsresource.ClusterResult{Err: err}
		return true, nil, err

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 368 to 369
// When the cluster update is not one of the three types of cluster updates.
clusterConfigs[clusterName] = &xdsresource.ClusterResult{Err: m.annotateErrorWithNodeID(fmt.Errorf("cluster type %v of cluster %s not supported", update.ClusterType, clusterName))}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better handled as the default case in the switch above, and once we do that we won't even need the return statement here as every branch of the switch will have a return statement of its own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when we do that, we won't need this comment either.

m.maybeSendUpdateLocked()
}

// RequestDNSReresolution calls all the the DNS resolvers ResolveNow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/resolvers/resolver's/

Comment on lines 204 to 216
{ID: clients.Locality{
Region: "region-1",
Zone: "zone-1",
SubZone: "subzone-1",
},
Endpoints: []xdsresource.Endpoint{
{
Addresses: []string{addr},
HealthStatus: xdsresource.EndpointHealthStatusUnknown,
Weight: 1,
},
},
Weight: 1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem to be fixed yet.

Comment on lines 117 to 126
// Safely access the first address string for comparison.
addrA := ""
if len(a.Addresses) > 0 {
addrA = a.Addresses[0].Addr
}

addrB := ""
if len(b.Addresses) > 0 {
addrB = b.Addresses[0].Addr
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this.

cmpopts.IgnoreFields(xdsresource.RouteConfigUpdate{}, "Raw"),
cmpopts.IgnoreFields(xdsresource.ClusterUpdate{}, "Raw", "LBPolicy", "TelemetryLabels"),
cmpopts.IgnoreFields(xdsresource.EndpointsUpdate{}, "Raw"),
cmp.Transformer("ErrorsToString", func(in error) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without having transformers attached to specific fields, it becomes the responsibility of the reader to figure out which fields are affected by the transformation. Either have a comment saying which fields or affected, or change the transformation to apply to specific fields.

Comment on lines 727 to 739
{ID: clients.Locality{
Region: "region-1",
Zone: "zone-1",
SubZone: "subzone-1",
},
Endpoints: []xdsresource.Endpoint{
{
Addresses: []string{"localhost:8081"},
HealthStatus: xdsresource.EndpointHealthStatusUnknown,
Weight: 1,
},
},
Weight: 1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hasn't been fixed yet.

updateCh: make(chan *xdsresource.XDSConfig),
errorCh: make(chan error),
}
ctx, cancel := context.WithTimeout(context.Background(), 50*defaultTestTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are increasing it for testing, please change the value of the constant (in one place). This is very easy to miss in a review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! CHanged.

@easwars
Copy link
Contributor

easwars commented Dec 17, 2025

I made a small refactoring commit to split the populateClusterConfigLocked method into smaller ones to handle updates for specific cluster types.

I also added the default case for the switch at the end. Please note that I changed it return true, nil, nil instead of false, nil, nil as it was doing earlier, because in this case, we have received the cluster resource. It just happens to be a type that we don't support. That shouldn't block us from sending an update if everything else in the tree is resolved (since we are anyways storing the error in the cluster config).

@easwars easwars assigned eshitachandwani and unassigned easwars Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants