Skip to content

Override exit code on process timeout #8937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 28, 2021

Conversation

julianbrost
Copy link
Contributor

@julianbrost julianbrost commented Jul 27, 2021

As Icinga first sends a SIGTERM to a check plugin on timeout to allow it to terminate gracefully, this is not really part of the plugin API specification and we cannot assume that plugins will handle this correctly and still exit with an exit code that maps to UNKNOWN. Therefore, once Icinga decides to kill a process, force its exit code to 128 to be sure the state will be UNKNOWN
after a timeout.

Note on Windows: TerminateProcess(m_Process, 3); is used there and the second parameter sets the exit code, so this is fine. There's only a small race condition where Icinga may do this right when the process exits by itself. Then the call to TerminateProcess() may fail because it already exited but Icinga would still write "" to the output. The exit code is intact though.

Test

object CheckCommand "sigterm-exit-1" {
	command = ["sh", "-c", "trap 'printf TERM; exit 1' TERM; printf 'check start'; sleep inf & wait $$!"]
	timeout = 10
}

Before

20210727_09h42m11s_grim

After

20210727_09h52m58s_grim

fixes #8931

@julianbrost julianbrost added the area/checks Check execution and results label Jul 27, 2021
@julianbrost julianbrost added this to the 2.13.0 milestone Jul 27, 2021
@julianbrost julianbrost requested a review from N-o-X July 27, 2021 09:18
Log(LogNotice, "Process")
<< "PID " << m_PID << " (" << PrettyPrintArguments(m_Arguments) << ") terminated with exit code " << exitcode;
Log(LogNotice, "Process") << "PID " << m_PID << " (" << PrettyPrintArguments(m_Arguments)
<< ") terminated with exit code " << exitcode << " after sending SIGTERM";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log message feels wrong because it didn't terminate with the here mentioned exit code. I'm not sure if logging the hard coded exit code is of any use here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I pushed another version, now there's a slight discrepancy in logging as the real exit code isn't part of the ProcessResult, but that's already the case for processes terminated by a signal:

[2021-07-27 18:01:38 +0200] notice/Process: PID 150 ('sh' '-c' 'trap 'printf TERM; exit 1' TERM; printf 'check start'; sleep inf & wait $!') terminated with exit code 1 after sending SIGTERM
[2021-07-27 18:01:38 +0200] warning/PluginCheckTask: Check command for object 'agent-a-1' (PID: 150, arguments: 'sh' '-c' 'trap 'printf TERM; exit 1' TERM; printf 'check start'; sleep inf & wait $!') terminated with exit code 128, output: check start<Timeout exceeded.>TERM

As Icinga first sends a SIGTERM to a check plugin on timeout to allow it to
terminate gracefully, this is not really part of the plugin API specification
and we cannot assume that plugins will handle this correctly and still exit
with an exit code that maps to UNKNOWN. Therefore, once Icinga decides to kill
a process, force its exit code to 128 to be sure the state will be UNKNOWN
after a timeout.
@julianbrost julianbrost force-pushed the bugfix/timeout-always-unknown branch from 0cfccb9 to a55939e Compare July 27, 2021 16:03
@julianbrost julianbrost requested a review from N-o-X July 27, 2021 16:08
@julianbrost julianbrost merged commit cc8d3fb into master Jul 28, 2021
@icinga-probot icinga-probot bot deleted the bugfix/timeout-always-unknown branch July 28, 2021 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/checks Check execution and results
Projects
None yet
Development

Successfully merging this pull request may close these issues.

State should always be UNKNOWN after Icinga decides check timeout is reached
2 participants