Description
openedon Apr 8, 2019
Is your feature request related to a problem? Please describe.
Our systems sometimes have to go down for maintenance, and our customer wants us to distinguish between expected and unexpected downtimes.
Describe the solution you'd like
Heartbeat would need a way to ingest whether a system is down for maintenance (planned downtime) or is actually experiencing unexpected issues. This could be added to the event as a separate field (maintenance:boolean) that could be reflected in the UI (e.g. differentiate between green for up, red for down, and blue for maintenance).
Describe alternatives you've considered
I already set up a Logstash pipeline to monitor our application's HTTP endpoint for uptime/downtime/errors/...
My current Logstash filter (simplified):
ruby {
code => 'event.set("maintenance", File.exists?("/tmp/"+event.get("[@metadata][name]")+"_maint"))'
}
When I want to set a system to maintenance, e.g. our PROD environment, I just create a file /tmp/prod_maint (touch /tmp/prod_maint
), and remove it afterwards (rm /tmp/prod_maint
). This is easy to do with various tools, and gives me great flexibility.
I don't think I can add this with a simple processor 1, as those don't allow checking for files.
Additional context
Initially asked in the forum, and Nicolas asked me to post it here.