Skip to content

Reason change once and it stay in wrong state with custom plugins #202

Closed
@aga20

Description

Hello,

I found something again I think but maybe I follwed wrong way.
I'm using a custom config based on the ntp example:

{
    "plugin": "custom",
    "pluginConfig": {
        "invoke_interval": "30s",
        "timeout": "5s",
        "max_output_length": 80,
        "concurrency": 3
    },
    "source": "ntp-custom-plugin-monitor",
    "conditions": [
        {
            "type": "CustomProblem",
            "reason": "CustomIsUp",
            "message": "Status of the custom service"
        }
    ],
    "rules": [
        {
            "type": "permanent",
            "condition": "CustomProblem",
            "reason": "CustomIsDown",
            "path": "/usr/bin/custom.sh",
            "timeout": "3s"
        }
    ]
}

The /usr/bin/custom.sh script is very simple: exit with 0 or 1.

So when the node problem detector start it set the condition:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  CustomProblem    False   Wed, 05 Sep 2018 14:40:26 +0200   Wed, 05 Sep 2018 14:40:25 +0200   CustomIsUp                   Status of the custom service
  OutOfDisk        False   Wed, 05 Sep 2018 14:40:23 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Wed, 05 Sep 2018 14:40:23 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 05 Sep 2018 14:40:23 +0200   Wed, 05 Sep 2018 13:17:06 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 05 Sep 2018 14:40:23 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 05 Sep 2018 14:40:23 +0200   Thu, 30 Aug 2018 17:35:04 +0200   KubeletReady                 kubelet is posting ready status

After it run the script (what returned with 0 in this case) the Status stay false but the Reason field changed to what I set in the rule section:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  CustomProblem    False   Wed, 05 Sep 2018 14:41:56 +0200   Wed, 05 Sep 2018 14:40:55 +0200   CustomIsDown                 Status of the custom service
  OutOfDisk        False   Wed, 05 Sep 2018 14:42:23 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Wed, 05 Sep 2018 14:42:23 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 05 Sep 2018 14:42:23 +0200   Wed, 05 Sep 2018 13:17:06 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 05 Sep 2018 14:42:23 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 05 Sep 2018 14:42:23 +0200   Thu, 30 Aug 2018 17:35:04 +0200   KubeletReady                 kubelet is posting ready status

So ok, in the next run the script exited with 1. The Status is True, and the Reason still same (this is what I set under the rule):

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  CustomProblem    True    Wed, 05 Sep 2018 14:43:56 +0200   Wed, 05 Sep 2018 14:43:55 +0200   CustomIsDown                 Status of the custom service
  OutOfDisk        False   Wed, 05 Sep 2018 14:43:53 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Wed, 05 Sep 2018 14:43:53 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 05 Sep 2018 14:43:53 +0200   Wed, 05 Sep 2018 13:17:06 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 05 Sep 2018 14:43:53 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 05 Sep 2018 14:43:53 +0200   Thu, 30 Aug 2018 17:35:04 +0200   KubeletReady                 kubelet is posting ready status

In the next round the script returned with 0 again and Status changed back to false but the Reason didn't change:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  CustomProblem    False   Wed, 05 Sep 2018 14:44:26 +0200   Wed, 05 Sep 2018 14:44:25 +0200   CustomIsDown                 Status of the custom service
  OutOfDisk        False   Wed, 05 Sep 2018 14:44:23 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Wed, 05 Sep 2018 14:44:23 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 05 Sep 2018 14:44:23 +0200   Wed, 05 Sep 2018 13:17:06 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 05 Sep 2018 14:44:23 +0200   Thu, 30 Aug 2018 17:16:47 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 05 Sep 2018 14:44:23 +0200   Thu, 30 Aug 2018 17:35:04 +0200   KubeletReady                 kubelet is posting ready status

As I see you overwrite the condition's rule and maybe the original condition lost and the node problem detector never can't set it again. https://github.com/kubernetes/node-problem-detector/blob/master/pkg/custompluginmonitor/custom_plugin_monitor.go#L140
But maybe I missed something.

Thank you!

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions