A watchdog which actively looks out for disruption and recovery of critical services. If there is a disruption then it will prevent cascading failure by conservatively scaling down dependent configured resources and if a critical service has just recovered then it will expedite the recovery of dependent services/pods.
Avoiding cascading failure is handled by Prober component and expediting recovery of dependent services/pods is handled by Weeder component. These are separately deployed as individual pods.
Although in the current offering the Prober
is tailored to handle one such use case of kube-apiserver
connectivity, but the usage of prober can be extended to solve similar needs for other scenarios where the components involved might be different.
See our documentation in the /docs repository, please find the index here.
We always look forward to active community engagement.
Please report bugs or suggestions on how we can enhance dependency-watchdog
to address additional recovery scenarios on GitHub issues