gunicorn not compatible with systemd watchdog timeout #2726

huwcbjones · 2022-01-19T20:07:03Z

By default systemd WatchdogSec "Defaults to 0, which disables this feature."
When a service is configured with a watchdog, the service should notify systemd that it is still alive by sending WATCHDOG=1 (keepalive ping) to ensure systemd doesn't kill the process.

We'd like to enable this to ensure that systemd detects if gunicorn gets stuck, however at the moment I've got systemd aborting the main PID everytime the watchdog timer fires which took me a bit too long to realise! 😅

By default, I'd probably shove the keepalive in the main process, however I haven't looked too much into how gunicorn hangs together yet.
If you guys don't want to support the watchdog timer, it would be good to document that gunicorn is not compatible with it.

The text was updated successfully, but these errors were encountered:

javabrett · 2022-01-20T04:36:50Z

Related SO thread: https://stackoverflow.com/questions/63945102/gunicorn-with-systemd-watchdog

How often would be too often too send WATCHDOG=1 - is every second too much? How expensive is it - just a socket write? Or does the value of WATCHDOG_USEC need to be considered (other than it being non-zero)?

huwcbjones · 2022-01-20T08:25:35Z

Thanks for the SO link, I didn't manage to find that and it clears up a few preconceptions 👍🏻

We tend to notify WATCHDOG_USEC/4, we also have an app that is loop based and notifies on completion of every loop.

javabrett · 2022-01-20T22:43:48Z

At a glance, there would be a fair bit more work in taking e.g. WATCHDOG_USEC / 4 or WATCHDOG_USEC / 2 and only sending WATCHDOG=1 at that specific rate. Arbiter has a natural enough loop for this, but I think it is hard-coded to 1 second IIRC.

So in terms of measuring the effort and effectiveness of implementing watchdog notifications, it would be good to gather feedback on whether:

Opening the socket and sending a notification each arbiter loop once per second (roughly) could ever be too much.
Whether ignoring the value, or perhaps warning about inappropriately-low values for WatchdocSec is OK, i.e. fixed notification 1-per-second from arbiter poll loop is going to be OK.
This would effectively mean documenting that WatchdocSec minimum should be 2 or 4 seconds and that the actual value is ignored by Gunicorn watchdog logic other than activating the logic to send roughly once-per-second.

Implementing WATCHDOG_USEC / n would require some stopwatch logic which might not be worth the trouble. But I'm not confident on the cost or risk of ongoing socket open/write/close to the systemd watchdog fd.

hmoffatt · 2024-07-16T06:58:00Z

There's a pure Python implementation of sd_notify here, fwiw: https://github.com/bb4242/sdnotify/blob/master/sdnotify/__init__.py

benoitc added Deploy To Review labels Oct 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gunicorn not compatible with systemd watchdog timeout #2726

gunicorn not compatible with systemd watchdog timeout #2726

huwcbjones commented Jan 19, 2022

javabrett commented Jan 20, 2022

huwcbjones commented Jan 20, 2022

javabrett commented Jan 20, 2022

hmoffatt commented Jul 16, 2024

gunicorn not compatible with systemd watchdog timeout #2726

gunicorn not compatible with systemd watchdog timeout #2726

Comments

huwcbjones commented Jan 19, 2022

javabrett commented Jan 20, 2022

huwcbjones commented Jan 20, 2022

javabrett commented Jan 20, 2022

hmoffatt commented Jul 16, 2024