Description
Hello,
I found something again I think but maybe I follwed wrong way.
I'm using a custom config based on the ntp example:
{
"plugin": "custom",
"pluginConfig": {
"invoke_interval": "30s",
"timeout": "5s",
"max_output_length": 80,
"concurrency": 3
},
"source": "ntp-custom-plugin-monitor",
"conditions": [
{
"type": "CustomProblem",
"reason": "CustomIsUp",
"message": "Status of the custom service"
}
],
"rules": [
{
"type": "permanent",
"condition": "CustomProblem",
"reason": "CustomIsDown",
"path": "/usr/bin/custom.sh",
"timeout": "3s"
}
]
}
The /usr/bin/custom.sh script is very simple: exit with 0 or 1.
So when the node problem detector start it set the condition:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
CustomProblem False Wed, 05 Sep 2018 14:40:26 +0200 Wed, 05 Sep 2018 14:40:25 +0200 CustomIsUp Status of the custom service
OutOfDisk False Wed, 05 Sep 2018 14:40:23 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 05 Sep 2018 14:40:23 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 05 Sep 2018 14:40:23 +0200 Wed, 05 Sep 2018 13:17:06 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 05 Sep 2018 14:40:23 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 05 Sep 2018 14:40:23 +0200 Thu, 30 Aug 2018 17:35:04 +0200 KubeletReady kubelet is posting ready status
After it run the script (what returned with 0 in this case) the Status stay false but the Reason field changed to what I set in the rule section:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
CustomProblem False Wed, 05 Sep 2018 14:41:56 +0200 Wed, 05 Sep 2018 14:40:55 +0200 CustomIsDown Status of the custom service
OutOfDisk False Wed, 05 Sep 2018 14:42:23 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 05 Sep 2018 14:42:23 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 05 Sep 2018 14:42:23 +0200 Wed, 05 Sep 2018 13:17:06 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 05 Sep 2018 14:42:23 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 05 Sep 2018 14:42:23 +0200 Thu, 30 Aug 2018 17:35:04 +0200 KubeletReady kubelet is posting ready status
So ok, in the next run the script exited with 1. The Status is True, and the Reason still same (this is what I set under the rule):
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
CustomProblem True Wed, 05 Sep 2018 14:43:56 +0200 Wed, 05 Sep 2018 14:43:55 +0200 CustomIsDown Status of the custom service
OutOfDisk False Wed, 05 Sep 2018 14:43:53 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 05 Sep 2018 14:43:53 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 05 Sep 2018 14:43:53 +0200 Wed, 05 Sep 2018 13:17:06 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 05 Sep 2018 14:43:53 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 05 Sep 2018 14:43:53 +0200 Thu, 30 Aug 2018 17:35:04 +0200 KubeletReady kubelet is posting ready status
In the next round the script returned with 0 again and Status changed back to false but the Reason didn't change:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
CustomProblem False Wed, 05 Sep 2018 14:44:26 +0200 Wed, 05 Sep 2018 14:44:25 +0200 CustomIsDown Status of the custom service
OutOfDisk False Wed, 05 Sep 2018 14:44:23 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 05 Sep 2018 14:44:23 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 05 Sep 2018 14:44:23 +0200 Wed, 05 Sep 2018 13:17:06 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 05 Sep 2018 14:44:23 +0200 Thu, 30 Aug 2018 17:16:47 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 05 Sep 2018 14:44:23 +0200 Thu, 30 Aug 2018 17:35:04 +0200 KubeletReady kubelet is posting ready status
As I see you overwrite the condition's rule and maybe the original condition lost and the node problem detector never can't set it again. https://github.com/kubernetes/node-problem-detector/blob/master/pkg/custompluginmonitor/custom_plugin_monitor.go#L140
But maybe I missed something.
Thank you!