-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Description
Describe the feature:
Currently on the overview page we show monitors in one of 2 states. Up, or down. In heartbeat 7.6+ we now have the ability to compute a third 'dead' state, where the time range includes a monitor, but the monitor itself has missed its next expected run.
Consider the timeline below, where tC indicates a check has been sent, and the letter | indicates the time the next expected check is supposed to happen.
Monitor A:------C-----|C-------|C-------|C>------|
Monitor B:------C-----|C-------|---------------------
------------------- time --------------------> (now)
Monitor A in this example is alive, it's next expected check is soon, but it hasn't missed its schedule in any significant way. Monitor B sent two checks, but never sent the third expected check. We can safely mark it as dead.
This issue proposes that we
- Add a third category of 'dead' monitors. (Defined as monitors that have not received an expected check in > 5m by default).
- By default do not display dead monitors, with a toggle switch on the overview page making them optionally visible (and a count indicating the number of dead monitors).
- Do not show the last status of dead monitors (for perf reasons)
Describe a specific use case for the feature:
Using the app today, if a user deletes a monitor ID it stays around when looking at longer history ranges. This can create quite a bit of clutter and be confusing. Additionally, the performance optimization introduced in #52433 has some odd corner cases where the new optimal strategy doesn't return status data for dead monitors. Rather than 'fix' that behavior we could more clearly indicate the actual state of these monitors.
@katrin-freihofner would esp. appreciate your feedback here.
@katrin-freihofner @drewpost @justinkambic @shahzad31