Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gunicorn not compatible with systemd watchdog timeout #2726

Open
huwcbjones opened this issue Jan 19, 2022 · 4 comments
Open

gunicorn not compatible with systemd watchdog timeout #2726

huwcbjones opened this issue Jan 19, 2022 · 4 comments

Comments

@huwcbjones
Copy link

By default systemd WatchdogSec "Defaults to 0, which disables this feature."
When a service is configured with a watchdog, the service should notify systemd that it is still alive by sending WATCHDOG=1 (keepalive ping) to ensure systemd doesn't kill the process.

We'd like to enable this to ensure that systemd detects if gunicorn gets stuck, however at the moment I've got systemd aborting the main PID everytime the watchdog timer fires which took me a bit too long to realise! 😅

By default, I'd probably shove the keepalive in the main process, however I haven't looked too much into how gunicorn hangs together yet.
If you guys don't want to support the watchdog timer, it would be good to document that gunicorn is not compatible with it.

@javabrett
Copy link
Collaborator

Related SO thread: https://stackoverflow.com/questions/63945102/gunicorn-with-systemd-watchdog

How often would be too often too send WATCHDOG=1 - is every second too much? How expensive is it - just a socket write? Or does the value of WATCHDOG_USEC need to be considered (other than it being non-zero)?

@huwcbjones
Copy link
Author

Thanks for the SO link, I didn't manage to find that and it clears up a few preconceptions 👍🏻

We tend to notify WATCHDOG_USEC/4, we also have an app that is loop based and notifies on completion of every loop.

@javabrett
Copy link
Collaborator

At a glance, there would be a fair bit more work in taking e.g. WATCHDOG_USEC / 4 or WATCHDOG_USEC / 2 and only sending WATCHDOG=1 at that specific rate. Arbiter has a natural enough loop for this, but I think it is hard-coded to 1 second IIRC.

So in terms of measuring the effort and effectiveness of implementing watchdog notifications, it would be good to gather feedback on whether:

  • Opening the socket and sending a notification each arbiter loop once per second (roughly) could ever be too much.
  • Whether ignoring the value, or perhaps warning about inappropriately-low values for WatchdocSec is OK, i.e. fixed notification 1-per-second from arbiter poll loop is going to be OK.
  • This would effectively mean documenting that WatchdocSec minimum should be 2 or 4 seconds and that the actual value is ignored by Gunicorn watchdog logic other than activating the logic to send roughly once-per-second.

Implementing WATCHDOG_USEC / n would require some stopwatch logic which might not be worth the trouble. But I'm not confident on the cost or risk of ongoing socket open/write/close to the systemd watchdog fd.

@hmoffatt
Copy link

There's a pure Python implementation of sd_notify here, fwiw: https://github.com/bb4242/sdnotify/blob/master/sdnotify/__init__.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants