-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kube Node Status NotReady detection #2345
Comments
I would be happy to submit a patch to support your use case. However, I noticed we already have Perhaps querying for something like https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/cluster/node-metrics.md |
@ricardoapl this only provides the status for the specific point in time when you scraped it. However, what if you scrape every 30 seconds, and within that interval, the node becomes NotReady for 10 seconds? You would miss that status change. From the idea it could be comparable with the |
I don't think we can get that information from the NodeStatus today: https://github.com/kubernetes/api/blob/v0.30.1/core/v1/types.go#L5871-L5936 Also if you miss the status it most likely means that it auto resolved in less than 30 seconds, so I am not sure how useful would be the information. |
@dgrisonnet I faced an issue with some nodes that switched to NotReady state which caused issues for some pods that I cannot recall anymore. Unfortunately the status change of the metric was not recorded by any metric. Due to that I have created an alert on log entries which is making us aware nowadays. I had a conversation with one of the maintainers during KubeCon Paris which was also of the opinion that this metric is missing. I cannot recall his name unfortunately. However if the API does not provide any way to obtain this data things will become complicated indeed. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/assign @CatherineF-dev |
What would you like to be added:
Currently the node state metrics is lagging a possibility to detect if a node has become notReady for any specific reasons. I would therefore like to request creation of a metric like node ready seconds for example or last status chang in order to be able to detect such situations.
The text was updated successfully, but these errors were encountered: