Add a new command and/or a new section to cortex cluster info that aggregates the health of Cortex processes.
A user might have the perception that everything it's okay with the cluster when in fact a specific component might be failing silently. An example would be prometheus not being deployed correctly and therefore preventing the autoscaler and grafana from working correctly.
Here are a few resources that can be scanned to determine overall cluster health.
API autoscaler crons can be rolled into their respective API statuses.
One potential design can be:
cortex cluster status
# operator: live
# prometheus: live
# grafana: live
# autoscaler: live
# (...)
Add a new command and/or a new section to
cortex cluster infothat aggregates the health of Cortex processes.A user might have the perception that everything it's okay with the cluster when in fact a specific component might be failing silently. An example would be prometheus not being deployed correctly and therefore preventing the autoscaler and grafana from working correctly.
Here are a few resources that can be scanned to determine overall cluster health.
API autoscaler crons can be rolled into their respective API statuses.
One potential design can be: