-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrate warp-systemd for more reliability in production deployments #2027
Conversation
@mpscholten Thanks. I think I don't fully understand how it works 😅
|
The cool thing of this change is that everyone using
will have this enabled by default. It's disabled by default in the haskell part, but in the nixos part we set The local url is correct. The app is basically pinging itself on localhost to make sure that it's still running. I dealt with a crash earlier today where the warp web server got stuck (I think it's the same issue described in https://blog.cachix.org/posts/2020-12-23-post-mortem-recent-downtime/). This would have been prevented by such a check. (We cannot use the actual host/domain of the server, as sometimes e.g. the DNS records are not yet updated, and then we don't want the server to run a restart loop). When the healthcheck fails, the app will be restarted. systemd is now configured to expect a heartbeat every 60 secs. The app is delivering a heartbeat every 30 secs. So in case the app get's stuck, it will be restarted after 60 seconds. There's also some other "hidden gems" of this change: we now use socket activation of systemd. Basically systemd now listens on the port when the server boots. And only when the first request comes in, systemd will start the IHP server and pass the socket to the server. This also comes handy when e.g. restarting the server. During the restart, any incoming HTTP requests will be queued by systemd and once the restart has finished, the IHP app will pick up the queued requests. Previously when the IHP app was restarted, there is a small window where the server is unreachable. |
THanks, that's cool. I'll create a PR to add some info to the docs 😸 One thing I didn't understand:
What's the advantage here? Sounds like the first request will take more time to get a response, no? |
thanks 🙌 i think i was not fully correct in my previous statement. The lazy start behavior is only true generally for systemd, but the IHP service is configured to start anyways on system boot time. so in our case there is no delay. The real advantage is the zero downtime restarts. |
http://0pointer.de/blog/projects/socket-activation.html this is a good resource on the topic (some parts a bit too deep in the details, but you get the gist) |
Found this package when looking at https://blog.cachix.org/posts/2020-12-23-post-mortem-recent-downtime/