-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xdsclient: fix flaky test TestServeAndCloseDoNotRace #7814
Conversation
…e to schedule a callback to do the actual work
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7814 +/- ##
==========================================
- Coverage 82.00% 81.77% -0.23%
==========================================
Files 373 373
Lines 37735 37848 +113
==========================================
+ Hits 30945 30951 +6
- Misses 5512 5587 +75
- Partials 1278 1310 +32
|
@@ -152,8 +157,7 @@ func NewForTesting(opts OptionsForTesting) (XDSClient, func(), error) { | |||
if err := bootstrap.SetFallbackBootstrapConfig(opts.Contents); err != nil { | |||
return nil, nil, err | |||
} | |||
client, cancel, err := newRefCounted(opts.Name, opts.WatchExpiryTimeout, opts.IdleChannelExpiryTimeout, opts.StreamBackoffAfterFailure) | |||
return client, func() { bootstrap.UnsetFallbackBootstrapConfigForTesting(); cancel() }, err | |||
return newRefCounted(opts.Name, opts.WatchExpiryTimeout, opts.IdleChannelExpiryTimeout, opts.StreamBackoffAfterFailure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that it's better to have tests clean up their own mutations/side-effects. This would prevent issues in which test fail only when run in a certain order. Since the BootstrapContentsForTesting
ServerOption is public, adding a new param to it is also not preferable. I don't know the best way to do this 😕.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about making callers of NewForTesting
ensure that bootstrap configuration is set either by setting one of the associated env vars or by calling SetFallbackBootstrapConfig
. If we did that, we can remove the Contents
field from OptionsForTesting
and have the tests setup the bootstrap configuration and do the cleanup themselves. But the number of callsites for NewForTesting
is enormous. So, I didn't want to do that as part of this PR.
But I agree with your concern, and maybe I can do that as a follow-up PR so that we take care of this flake first, and then cleanup the remaining tests.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to agree with Arjan here. Ever since I joined the team 3.5 years ago, Doug has always stressed to not set any globals amongst tests that persist over test iterations that would couple tests in any way.
Any time I have done this, he has always made me rewrite the test in order to be hermetic and not coupled with other tests. I agree with this because of the ordering thing Arjan mentioned, I think go test guarantees things are run serially but I forgot what the strict ordering requirements are, and also it's weird to have one test state affect another test state.
But since test is blocking development flow I'm fine merging this now and following up on it to address this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with doing the cleanup in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Thanks.
This PR addresses two issues:
authority.watchResource
, the watch is handled by scheduling a callback on the serializer and waiting for the callback to be executed by blocking on adone
channel that is closed when the callback has completed its work.done
channel. This meant that theauthority.watchResource
call would block forever.autority.dumpResources
as well.xdsclient.NewForTesting
accepts a bootstrap configuration as one of its options and sets the fallback bootstrap configuration (i.e. the configuration that gets used when the bootstrap env vars are not set). As part of the cleanup function that is returned, the fallback bootstrap configuration is unset.TestServeAndCloseDoNotRace
, the following happens in a loop:BootstrapContentsForTesting
server option. This results in an xDS client being created (or refcount on an existing one being incremented) usingxdsclient.NewForTesting
with the provided bootstrap configuration.Serve
.Stop
. This results in the xDS client cleanup function being invoked.xdsclient.NewForTesting
, but instead leaving that up to the tests.Ran on forge for 100K times without a flake.
Fixes #7627
RELEASE NOTES: none