Skip to content

Conversation

deliahu
Copy link
Member

@deliahu deliahu commented Dec 1, 2021

No description provided.

@deliahu deliahu requested a review from RobertLucian December 1, 2021 18:46
@deliahu deliahu changed the title Wait for zero in-flight requests before terminating proxy Wait for zero in-flight requests before terminating realtime proxy Dec 1, 2021
Copy link
Collaborator

@miguelvr miguelvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can achieve the same without adding an endpoint. You can simply wait for in-flight requests when the proxy gets the SIGTERM signal and before calling server.Shutdown()

@miguelvr
Copy link
Collaborator

miguelvr commented Dec 1, 2021

I think we can achieve the same without adding an endpoint. You can simply wait for in-flight requests when the proxy gets the SIGTERM signal and before calling server.Shutdown()

Ref to the code:

cortex/cmd/proxy/main.go

Lines 172 to 189 in 16ffd08

select {
case err = <-errCh:
exit(log, errors.Wrap(err, "failed to start proxy server"))
case <-sigint:
// We received an interrupt signal, shut down.
log.Info("Received TERM signal, handling a graceful shutdown...")
for name, server := range servers {
log.Infof("Shutting down %s server", name)
if err = server.Shutdown(context.Background()); err != nil {
// Error from closing listeners, or context timeout:
log.Warnw("HTTP server Shutdown Error", zap.Error(err))
telemetry.Error(errors.Wrap(err, "HTTP server Shutdown Error"))
}
}
log.Info("Shutdown complete, exiting...")
}
}

@deliahu
Copy link
Member Author

deliahu commented Dec 1, 2021

@miguelvr yeah, that would be cleaner!

I just tried adding it there, but it didn't work as we expected. So I dug into it, and realized that we weren't handling the TERM signal (just INT). After listening for TERM too, checking the breaker's in-flight count was no longer necessary, since server.Shutdown() seems to handle this automatically.

What are your thoughts on the diff now?

@tfriedel
Copy link

tfriedel commented Dec 1, 2021

Thanks for the PR! I built the images and tried my test (route which sleeps 1 sec). Contrary to the previous verion, deleting a pod did not result in 503s. Good job!

@deliahu
Copy link
Member Author

deliahu commented Dec 1, 2021

@tfriedel awesome, thanks for bringing this to our attention, and for confirming that the fix is working as intended!

I'm working on a separate PR now to support custom preStop commands.

@deliahu deliahu merged commit d0b29f3 into master Dec 2, 2021
@deliahu deliahu deleted the proxy-pre-stop branch December 2, 2021 19:17
deliahu added a commit that referenced this pull request Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants