-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telegraf seems to send SIGKILL on script timeout instead of first SIGINT then SIGKILL #2526
Comments
@jayjayeos Thanks for opening an issue, are you able to work on a PR? |
We also should investigate if we should signal the process group or only the parent. |
I have this issue also:
|
On Unix systems we should send SIGTERM first, but is there a signal on Windows to request shutdown? Based on golang/go#6720 it doesn't seem possible. |
Bug report
On timeout in a script executed by the inputs.exec plugin (and possibly others) Telegraf does not first send SIGTERM (SIGHUP, SIGINT, SIGQUIT) and then SIGKILL but directly sends SIGKILL.
While this kills the script, any cleanup that the script needs to do does not get a chance to run.
This leads to orphaned processes when a script has started child processes.
Relevant telegraf.conf:
[[inputs.exec]]
commands = [
"/etc/telegraf/sqlscripts/oracle_metrics.sh"
]
interval = "1m"
timeout = "5s"
data_format = "influx"
System info:
Oracle Linux Server release 6.3
Linux xxx.tld 2.6.39-400.17.1.el6uek.x86_64 #1 SMP Fri Feb 22 18:16:18 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
Telegraf v1.2.0 (git: release-1.2 b2c1d98)
Steps to reproduce:
Expected behavior:
script gets sent SIGTERM (SIGHUP, SIGINT, SIGQUIT), cleans up after itself then terminates itself.
Actual behavior:
script is killed hard without chance to run signal handler.
Additional info:
Have a look at how SIGKILL works and on best practices on terminating processes.
TL;DR: don't indiscriminately use SIGKILL
http://stackoverflow.com/questions/395877/are-child-processes-created-with-fork-automatically-killed-when-the-parent-is
http://stackoverflow.com/questions/690415/in-what-order-should-i-send-signals-to-gracefully-shutdown-processes
ftp://ftp.gnu.org/old-gnu/Manuals/glibc-2.2.3/html_chapter/libc_24.html#SEC472
Proposal:
Send a non-fatal signal first, then after a grace period (of, say, 5 seconds) send SIGKILL.
Current behavior:
script dies, init inherits childrend processes.
Desired behavior:
script cleans up after itself and commits suicide.
Use case: [Why is this important (helps with prioritizing requests)]
it fu**s up servers running Telegraf by racking up hundreds of orphaned processes.
The text was updated successfully, but these errors were encountered: