Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: update watcher API as per gRFC A88 #7977

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Jan 2, 2025

This is the first part of implementing gRFC A88 (grpc/proposal#466).

This introduces the new watcher API but does not change any of the existing behavior. This table summarizes the API changes and behavior for each case:

Case Old API New API Behavior
Resource timer fires OnResourceDoesNotExist() OnResourceChanged(NOT_FOUND) Fail data plane RPCs
LDS or CDS resource deletion OnResourceDoesNotExist() OnResourceChanged(NOT_FOUND) Drop resource and fail data plane RPCs
xDS channel reports TRANSIENT_FAILURE OnError() OnResourceChanged(status) if resource NOT already cached; OnAmbientError(status) otherwise Continue using cached resource, if any; otherwise, fail data plane RPCs
ADS stream terminates without receiving a response OnError() OnResourceChanged(status) if resource NOT already cached; OnAmbientError(status) otherwise Continue using cached resource, if any; otherwise, fail data plane RPCs
Invalid resource update (client NACK) OnError() OnResourceChanged(status) if resource NOT already cached; OnAmbientError(status) otherwise Continue using cached resource, if any; otherwise, fail data plane RPCs
Valid resource update OnUpdate(resource) OnResourceChanged(resource) use the new resource

RELEASE NOTES:

  • xds: TBD

@purnesh42H purnesh42H added Type: Feature New features or improvements in behavior Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Jan 2, 2025
@purnesh42H purnesh42H added this to the 1.70 Release milestone Jan 2, 2025
@purnesh42H purnesh42H self-assigned this Jan 2, 2025
@purnesh42H purnesh42H force-pushed the a88-watcher-api branch 7 times, most recently from 8d198b7 to a9b45c0 Compare January 3, 2025 16:28
@purnesh42H purnesh42H requested a review from markdroth January 3, 2025 16:58
@purnesh42H purnesh42H assigned dfawley and markdroth and unassigned purnesh42H Jan 3, 2025
@purnesh42H purnesh42H requested a review from dfawley January 3, 2025 16:59
@purnesh42H purnesh42H force-pushed the a88-watcher-api branch 2 times, most recently from 4009e3e to 57dbf23 Compare January 6, 2025 12:03
Copy link

codecov bot commented Jan 6, 2025

Codecov Report

Attention: Patch coverage is 72.22222% with 40 lines in your changes missing coverage. Please review.

Project coverage is 82.14%. Comparing base (724f450) to head (25dce25).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
xds/internal/testutils/resource_watcher.go 14.28% 11 Missing and 1 partial ⚠️
xds/internal/resolver/xds_resolver.go 28.57% 10 Missing ⚠️
xds/internal/server/listener_wrapper.go 37.50% 10 Missing ⚠️
.../balancer/clusterresolver/resource_resolver_eds.go 60.00% 5 Missing and 1 partial ⚠️
xds/internal/balancer/cdsbalancer/cdsbalancer.go 85.71% 1 Missing ⚠️
xds/internal/xdsclient/clientimpl_watchers.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7977      +/-   ##
==========================================
- Coverage   82.28%   82.14%   -0.15%     
==========================================
  Files         381      381              
  Lines       38539    38558      +19     
==========================================
- Hits        31712    31672      -40     
- Misses       5535     5583      +48     
- Partials     1292     1303      +11     
Files with missing lines Coverage Δ
...s/internal/balancer/cdsbalancer/cluster_watcher.go 100.00% <100.00%> (ø)
xds/internal/resolver/watch_service.go 100.00% <100.00%> (+8.57%) ⬆️
xds/internal/server/rds_handler.go 90.69% <100.00%> (-0.61%) ⬇️
xds/internal/xdsclient/authority.go 77.07% <100.00%> (+0.19%) ⬆️
...nal/xdsclient/xdsresource/cluster_resource_type.go 78.37% <100.00%> (+1.23%) ⬆️
...l/xdsclient/xdsresource/endpoints_resource_type.go 76.47% <100.00%> (+1.47%) ⬆️
...al/xdsclient/xdsresource/listener_resource_type.go 86.44% <100.00%> (+0.47%) ⬆️
...ds/internal/xdsclient/xdsresource/resource_type.go 100.00% <ø> (ø)
...dsclient/xdsresource/route_config_resource_type.go 76.47% <100.00%> (+1.47%) ⬆️
xds/internal/balancer/cdsbalancer/cdsbalancer.go 81.70% <85.71%> (-0.05%) ⬇️
... and 5 more

... and 15 files with indirect coverage changes

@purnesh42H purnesh42H force-pushed the a88-watcher-api branch 4 times, most recently from 43c9adb to 89f475a Compare January 6, 2025 12:26
Comment on lines 61 to 63
// ResourceWatcher is an interface that can to be implemented to wrap the
// callbacks to be invoked for different events corresponding to the resource
// being watched.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like this change to the comment adds much value. The fact that it is an interface is clear from the type definition and since it is an interface, it *needs *to be implemented. So, I would leave this unchanged or improve it in a different way.

// OnUpdate is invoked to report an update for the resource being watched.
// OnResourceChanged is invoked to notify the watcher of a new version of
// the resource received from the xDS server or an error indicating the
// reason why the resource cannot be obtained.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/cannot/could not/

Comment on lines 70 to 72
// type for the resource being watched. In case of error, the ResourceData
// is nil otherwise its not nil and error is nil but both will never be nil
// together.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super confusing.

Should we consider implementing a naive version of the C++ StatusOr?

It could be something like

type ResourceDataOrError {
  data ResourceData
  err error
}

func (r *ResourceDataOrError) ResourceData {
  return r.data
}

func (r *ResourceDataOrError) Error {
  return r.err
}

func NewResourceDataOrErrorFromResourceData(data ResourceData) *ResourceDataOrError {
  return &ResourceDataOrError{data: data}
}

func NewResourceDataOrErrorFromError(err error) *ResourceDataOrError {
  return &ResourceDataOrError{err: err}
}

The fact that the fields are unexported means that the above struct can only be constructed with one of the above NewXxx, and that guarantees that only one of the two fields will be set at any given point in time.

The names of the NewXxx functions suck at this point. But maybe with some thought, we can come up with something better.

Or we could event come up with a generic version of this which takes inspiration from the Rust Result<T, E> type.

@dfawley : thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Go-typical way to implement a sum type is to just have two fields, only one of which is set. And a comment saying as much.

// Whatever contains either Data or Err if an error occurred while trying to obtain Data.
type Whatever struct {
	Err error
	Data Something
}

That effectively mirrors:

func SomethingThatCouldFail() (Something, error) {}

Copy link
Contributor Author

@purnesh42H purnesh42H Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to

// ResourceDataOrError is a struct that contains either ResourceData or error.
// It is used to represent the result of an xDS resource update. Exactly one of
// Data or Err will be non-nil.
type ResourceDataOrError struct {
	Data ResourceData
	Err  error
}

@easwars easwars assigned purnesh42H and unassigned easwars Jan 9, 2025
@purnesh42H purnesh42H requested a review from easwars January 10, 2025 16:35
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Jan 10, 2025
// will never be nil together.
//
// Watcher is expected to use the most recent value passed to
// OnResourceChanged(), regardless of whether that's a resource or an error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/ whether that's a resource/whether that's a ResourceData/

Comment on lines +72 to +79
// OnResourceChanged is invoked to notify the watcher of a new version of
// the resource received from the xDS server or an error indicating the
// reason why the resource could not be obtained.
//
// The ResourceData of the ResourceDataOrError needs to be type asserted to
// the appropriate type for the resource being watched. In case of error,
// the ResourceData is nil otherwise its not nil and error is nil but both
// will never be nil together.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about replacing the first two paragraphs with the following:

	// OnResourceChanged is invoked to notify the watcher of a new version of
	// the resource received from the xDS server or an error indicating the
	// reason why the resource could not be obtained. 
	//
	// In the former case, this callback will be invoked with a non-nil ResourceData
	// in ResourceDataOrError. The ResourceData of the ResourceDataOrError needs
	// to be type asserted to the appropriate type for the resource being watched. 
	//
	// In the latter case, this callback will be invoked with a non-nil error value in ResourceDataOrError.

// - connection failure (if resource is not cached)
OnResourceChanged(*ResourceDataOrError, OnDoneFunc)

// If resource is already cached, it is invoked under different error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please start with the docstring with the name of the function, as is customary in Go.

// - connection failure (if resource is not cached)
OnResourceChanged(*ResourceDataOrError, OnDoneFunc)

// If resource is already cached, it is invoked under different error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please start with the docstring with the name of the function, as is customary in Go.

// - connection failure (if resource is not cached)
OnResourceChanged(*ResourceDataOrError, OnDoneFunc)

// If resource is already cached, it is invoked under different error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please start with the docstring with the name of the function, as is customary in Go.

Comment on lines +531 to +535
if xdsresource.ErrType(err) == xdsresource.ErrorTypeResourceNotFound {
r.logger.Infof("Received resource-not-found-error for Listener resource %q", r.ldsResourceName)
} else {
r.logger.Infof("Received on-resource-changed error for Listener resource %q: %v", r.ldsResourceName, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simply printing the error in the log instead of checking for the error type and printing different log statements. Printing the error value itself should give information about what type of error was recieved.

Comment on lines +576 to +580
if xdsresource.ErrType(err) == xdsresource.ErrorTypeResourceNotFound {
r.logger.Infof("Received resource-not-found-error for RouteConfiguration resource %q", name)
} else {
r.logger.Infof("Received on-resource-changed error for RouteConfiguration resource %q: %v", name, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Comment on lines +87 to +91
if xdsresource.ErrType(update.Err) == xdsresource.ErrorTypeResourceNotFound {
er.logger.Infof("EDS discovery mechanism for resource %q reported resource-does-not-exist error", er.nameToWatch)
} else {
er.logger.Infof("EDS discovery mechanism for resource %q reported on resource changed error: %v", er.nameToWatch, update.Err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Please print the error to logs instead of having to check for the type of error.

@@ -525,15 +525,14 @@ func (b *cdsBalancer) onClusterError(name string, err error) {
// TRANSIENT_FAILURE.
//
// Only executed in the context of a serializer callback.
func (b *cdsBalancer) onClusterResourceNotFound(name string) {
err := xdsresource.NewErrorf(xdsresource.ErrorTypeResourceNotFound, "resource name %q of type Cluster not found in received response", name)
func (b *cdsBalancer) onClusterResourceChangedError(name string, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log the error here?

if b.childLB != nil {
b.childLB.ResolverError(err)
} else {
// If child balancer was never created, fail the RPCs with errors.
b.ccw.UpdateState(balancer.State{
ConnectivityState: connectivity.TransientFailure,
Picker: base.NewErrPicker(err),
Picker: base.NewErrPicker(fmt.Errorf("%q: %v", name, err)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you verified that this error is as descriptive as what we had earlier?

@easwars easwars assigned purnesh42H and unassigned easwars, markdroth and dfawley Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants