Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: fix flaky test TestServeAndCloseDoNotRace #7814

Merged
merged 3 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions xds/internal/resolver/xds_resolver_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ import (
// build fails as well.
func (s) TestResolverBuilder_ClientCreationFails_NoBootstrap(t *testing.T) {
// Build an xDS resolver without specifying bootstrap env vars.
bootstrap.UnsetFallbackBootstrapConfigForTesting()
builder := resolver.Get(xdsresolver.Scheme)
if builder == nil {
t.Fatalf("Scheme %q is not registered", xdsresolver.Scheme)
Expand Down
11 changes: 8 additions & 3 deletions xds/internal/xdsclient/authority.go
Original file line number Diff line number Diff line change
Expand Up @@ -431,7 +431,7 @@ func (a *authority) watchResource(rType xdsresource.Type, resourceName string, w
cleanup := func() {}
done := make(chan struct{})

a.xdsClientSerializer.TrySchedule(func(context.Context) {
a.xdsClientSerializer.ScheduleOr(func(context.Context) {
defer close(done)

if a.logger.V(2) {
Expand Down Expand Up @@ -479,6 +479,11 @@ func (a *authority) watchResource(rType xdsresource.Type, resourceName string, w
a.watcherCallbackSerializer.TrySchedule(func(context.Context) { watcher.OnUpdate(resource, func() {}) })
}
cleanup = a.unwatchResource(rType, resourceName, watcher)
}, func() {
if a.logger.V(2) {
a.logger.Infof("Failed to schedule a watch for type %q, resource name %q, because the xDS client is closed", rType.TypeName(), resourceName)
}
close(done)
})
<-done
return cleanup
Expand Down Expand Up @@ -586,10 +591,10 @@ func (a *authority) dumpResources() []*v3statuspb.ClientConfig_GenericXdsConfig
var ret []*v3statuspb.ClientConfig_GenericXdsConfig
done := make(chan struct{})

a.xdsClientSerializer.TrySchedule(func(context.Context) {
a.xdsClientSerializer.ScheduleOr(func(context.Context) {
defer close(done)
ret = a.resourceConfig()
})
}, func() { close(done) })
<-done
return ret
}
Expand Down
8 changes: 6 additions & 2 deletions xds/internal/xdsclient/client_new.go
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,11 @@ type OptionsForTesting struct {

// NewForTesting returns an xDS client configured with the provided options.
//
// Sets the fallback bootstrap configuration to the contents in the
// opts.Contents field. This value persists for the life of the test binary. So,
// tests that want this value to be empty should call
// bootstrap.UnsetFallbackBootstrapConfigForTesting to ensure the same.
//
// The second return value represents a close function which the caller is
// expected to invoke once they are done using the client. It is safe for the
// caller to invoke this close function multiple times.
Expand All @@ -152,8 +157,7 @@ func NewForTesting(opts OptionsForTesting) (XDSClient, func(), error) {
if err := bootstrap.SetFallbackBootstrapConfig(opts.Contents); err != nil {
return nil, nil, err
}
client, cancel, err := newRefCounted(opts.Name, opts.WatchExpiryTimeout, opts.IdleChannelExpiryTimeout, opts.StreamBackoffAfterFailure)
return client, func() { bootstrap.UnsetFallbackBootstrapConfigForTesting(); cancel() }, err
return newRefCounted(opts.Name, opts.WatchExpiryTimeout, opts.IdleChannelExpiryTimeout, opts.StreamBackoffAfterFailure)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that it's better to have tests clean up their own mutations/side-effects. This would prevent issues in which test fail only when run in a certain order. Since the BootstrapContentsForTesting ServerOption is public, adding a new param to it is also not preferable. I don't know the best way to do this 😕.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about making callers of NewForTesting ensure that bootstrap configuration is set either by setting one of the associated env vars or by calling SetFallbackBootstrapConfig. If we did that, we can remove the Contents field from OptionsForTesting and have the tests setup the bootstrap configuration and do the cleanup themselves. But the number of callsites for NewForTesting is enormous. So, I didn't want to do that as part of this PR.

But I agree with your concern, and maybe I can do that as a follow-up PR so that we take care of this flake first, and then cleanup the remaining tests.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to agree with Arjan here. Ever since I joined the team 3.5 years ago, Doug has always stressed to not set any globals amongst tests that persist over test iterations that would couple tests in any way.

Any time I have done this, he has always made me rewrite the test in order to be hermetic and not coupled with other tests. I agree with this because of the ordering thing Arjan mentioned, I think go test guarantees things are run serially but I forgot what the strict ordering requirements are, and also it's weird to have one test state affect another test state.

But since test is blocking development flow I'm fine merging this now and following up on it to address this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with doing the cleanup in a later PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Thanks.

}

// GetForTesting returns an xDS client created earlier using the given name.
Expand Down
19 changes: 15 additions & 4 deletions xds/server_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -176,9 +176,14 @@ func (s) TestNewServer_Failure(t *testing.T) {
wantErr string
}{
{
desc: "bootstrap env var not set",
serverOpts: []grpc.ServerOption{grpc.Creds(xdsCreds)},
wantErr: "failed to get xDS bootstrap config",
desc: "bootstrap env var not set",
serverOpts: func() []grpc.ServerOption {
// Ensure that any fallback bootstrap configuration setup by
// previous tests is cleared.
bootstrap.UnsetFallbackBootstrapConfigForTesting()
return []grpc.ServerOption{grpc.Creds(xdsCreds)}
}(),
wantErr: "failed to get xDS bootstrap config",
},
{
desc: "empty bootstrap config",
Expand Down Expand Up @@ -696,10 +701,16 @@ func (s) TestServeAndCloseDoNotRace(t *testing.T) {
t.Fatalf("testutils.LocalTCPListener() failed: %v", err)
}

// Generate bootstrap contents up front for all servers, and clear the
// fallback bootstrap configuration that gets set when a server is created
// with the BootstrapContentsForTesting() server option.
bootstrapContents := generateBootstrapContents(t, uuid.NewString(), nonExistentManagementServer)
defer bootstrap.UnsetFallbackBootstrapConfigForTesting()

wg := sync.WaitGroup{}
wg.Add(200)
for i := 0; i < 100; i++ {
server, err := NewGRPCServer(BootstrapContentsForTesting(generateBootstrapContents(t, uuid.NewString(), nonExistentManagementServer)))
server, err := NewGRPCServer(BootstrapContentsForTesting(bootstrapContents))
if err != nil {
t.Fatalf("Failed to create an xDS enabled gRPC server: %v", err)
}
Expand Down