feat: connection draining on stop CLI command#4443
Open
kaiburjack wants to merge 1 commit intovarnishcache:masterfrom
Open
feat: connection draining on stop CLI command#4443kaiburjack wants to merge 1 commit intovarnishcache:masterfrom
kaiburjack wants to merge 1 commit intovarnishcache:masterfrom
Conversation
Contributor
Author
|
I think this ubuntu noble test failure is something else. This particular job fails on basically every recent MR, including (where it is the only failed job): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I've added "connection draining" support to the
stopCLI command, as proposed in this comment.The
stopCLI command now takes an optional parameter-twith an optional argument<duration>. When-tis specified it will start a "drain period" for at mostshutdown_timeoutduration during which no idle connections are closed but all requests are responded with an additionalConnection: closeheader to make the client leave the connection.When the argument to
-tis specified, then this will instead be the effective shutdown timeout (and not theshutdown_timeoutvarnishd parameter).The
stopCLI command will also not block anymore.Now, in order to do all this, we need an additional thread which monitors drain progress and an additional gauge to track how many active client/downstream connections we still have.
We also need to wake up idle workers (that are sitting in their 60s sleep) to make them release their VCL references quickly. This is also a pending issue since Varnish 8.0.0, that upon initiated shutdown, users get
Child (PID) said shutdown waiting for N references on bootmessages for at most a minute, if workers stay idle during shutdown when no new requests come in.I've tried to implement this as minimally as possible and without breaking existing tests (I've run the test suite locally (on Linux Debian/Mint x64)) and added additional tests for the drain feature.
There were also a few corner cases of assertions failing because the manager now does not immediately reap the client but the client reaps itself.
Now, for my specific use-case: Kubernetes: When deploying this as a varnishd container process in a pod, then in order to actually make use of this draining, we need a preStop exec hook which issues
varnishadm stop -t 30sand then repeatedly monitorsvarnishadm statusto see when the client exited such that varnishd itself can get SIGTERM from the kubelet and the container and pod can exit.