You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Grafana dashboard threw a 503 error when there was an outage across all the hubs today. Considering that @felder uses grafana hub metrics to measure the health across all hubs, it is important that grafana gets isolated from such outages.
Note: Grafana came back after all the hubs started operating. So, there is no issue with grafana as of now.
Creating this issue to explore different design choices to avoid this scenario in the future.
Environment & setup
Grafana + Prometheus combination
How to reproduce
See the above description!
The text was updated successfully, but these errors were encountered:
One of the discussion items we had from a short meeting today was steps to improve Grafana's resilience. @shaneknapp had an idea that it would make sense to create a new monitoring node pool and move our monitoring infrastructure which includes Grafana and Prometheus to this node pool. Through this approach, we will still have the monitoring infra work when the hubs part of the core node pool goes down.
Bug description
Grafana dashboard threw a 503 error when there was an outage across all the hubs today. Considering that @felder uses grafana hub metrics to measure the health across all hubs, it is important that grafana gets isolated from such outages.
Note: Grafana came back after all the hubs started operating. So, there is no issue with grafana as of now.
Creating this issue to explore different design choices to avoid this scenario in the future.
Environment & setup
Grafana + Prometheus combination
How to reproduce
See the above description!
The text was updated successfully, but these errors were encountered: