-
Notifications
You must be signed in to change notification settings - Fork 679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use envoy xds server type in daily e2e tests #4100
Use envoy xds server type in daily e2e tests #4100
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4100 +/- ##
=======================================
Coverage 77.72% 77.72%
=======================================
Files 112 112
Lines 9995 9995
=======================================
Hits 7769 7769
Misses 2041 2041
Partials 185 185 |
ed87ba5
to
d57d0b7
Compare
looks like envoy has a hard time sometimes connecting to the local contour in the e2e tests in envoy grpc server mode |
|
bumping the bootstrap connect timeout seems to have helped, it's hardcoded at 5s for now, maybe should be configurable |
going back through git history, seems like a 5s connect timeout was somewhat arbitrarily chosen begs the question why the go-control-plane version takes longer to connect to, but we could bump it or make it configurable |
hm, even with bumping the connect timeout, seems fetches fail with the same Envoy log as above |
default "initial fetch timeout" is 15s, should be enough to fetch initial clusters and listeners, increasing we still get intermittently stuck on the initial fetch |
when leader election is disabled looks like the first e2e test always fails, we're not responding to the initial Cluster/Listener request at all, subsequent tests pass when the first Contour exits and Envoy reconnects to the instance that comes up for the next test |
Looks like the state of the world in how Contour sets up xDS etc. is as below, which explains why the e2e tests fail when you just swap over the xDS server type:
|
in general contour startup with the it happens to work when leader election is enabled bc that coincidentally takes longer typically than xDS server initialization |
This sounds like another reason we need to fix up the initialization process and have a way to ensure the ordering for some goroutines? |
Marking this PR stale since there has been no activity for 14 days. It will be closed if there is no activity for another 30 days. |
Marking this PR stale since there has been no activity for 14 days. It will be closed if there is no activity for another 30 days. |
d57d0b7
to
1ae0b77
Compare
recent refactors have made #4100 (comment) obsolete, the event handler is now notified of leader election when it is enabled and disabled through the leadership notifier runnable that triggers it to run an update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
One other idea here would be to set up a nightly run of the full E2E suite using the "envoy" control plane, to get regular full coverage on it without blowing up our E2E matrix for PRs.
yeah i like that better actually, going to update to do that instead of the smoke tests |
Run e2e tests daily against envoy xds server type that uses go-control-plane. Once we are confident it is stable, we can flip the default to be the envoy go-control-plane variant and run the daily tests with the contour variant until we drop support for it. Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
394d8c5
to
2e2b944
Compare
So we see a notification during the day Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, LGTM.
I'm now thinking maybe we should re-jigger #4126 to also run its tests in this daily run, instead of running a small subset of the E2E's against an Envoy deployment for every PR, for similar reasons as for this one. Can ponder/tackle that separately.
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Run e2e tests daily against envoy xds server type that uses
go-control-plane.
Once we are confident it is stable, we can flip the default to be the
envoy go-control-plane variant and run the daily tests with the contour
variant until we drop support for it.