Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alert attached to the resolution trigger #206

Open
fabienmagagnosc opened this issue Sep 13, 2024 · 4 comments
Open

alert attached to the resolution trigger #206

fabienmagagnosc opened this issue Sep 13, 2024 · 4 comments

Comments

@fabienmagagnosc
Copy link

What did you do?

I use 2 templates to generate 2 fields to allow automatic alarm resolution :

_ a default one, to provide a status FAULT or OK

{{- if .Alerts -}}
FAULT
{{ else -}}
OK
{{- end -}}

_ another one to provide the alarm information, and as any alarming system (including for example the prometheus alarm manager and others) it require to have unique "ID" to match a fault, and when it's solved.

{{ range $severity, $alerts := (groupAlertsByLabel .Alerts "severity") -}}
{{- range $index, $alert := $alerts }}
{{ $alert.Labels.severity }};{{ $alert.Labels.instance }};{{ $alert.Labels.job }};{{ $alert.Labels.alertname }};{{ $alert.Annotations.summary }};{{ $alert.Annotations.description }}
{{ end }}
{{ end }}

In my object, i got a CVS format string with the alertname, the instance, the job, the description and the summary
So, the SNMP alarm system can use the alertname+instance to identify uniquely the alarm

What did you expect to see?

the alarms firing and resolving must be fairly identical, and only the description must change : FAULT or OK
and the extra field allow to get in case of firing the description and summary and instance to document the alarm and the information will allow to match to firing and the resolved automatically

What did you see instead? Under which circumstances?

in case of alarms firing, no issue, everything is filled
in case of alarms resolved, the extra field is empty as

Environment

  • System information:

    it's the docker image

  • SNMP notifier version:

    maxwo/snmp-notifier:latest as per today, so 1.5 I suppose

Note : I tested with a modified version, build locally, with the code alert_parser.go, line 69 removed (and syntax corrected)
and it was then working properly, and logically meaning every alarms are treated equals

snmp_notifier, version 1.5.0 (branch: main, revision: 9344558)
build user: tecnotree@centos
build date: 20240913-16:08:54
go version: go1.22.5 (Red Hat 1.22.5-2.el9)
platform: linux/amd64
tags: netgo

  • Alertmanager version:

    prom/alertmanager:latest as per today, so it's

Version Information
Branch:
HEAD
BuildDate:
20240228-11:51:20
BuildUser:
root@22cd11f671e9
GoVersion:
go1.21.7
Revision:
0aa3c2aad14cff039931923ab16b26b7481783b5
Version:
0.27.0

  • Prometheus version:

Not valid, as the alarms are coming from Grafana here

  • Alertmanager command line:

  • SNMP notifier command line:

./snmp_notifier --snmp.trap-description-template=description-template.tpl --snmp.extra-field-template=4=object-template.tpl --snmp.version=V2c --snmp.destination=ss-vip:162 --snmp.community=tecnomen --snmp.timeout=5s --web.listen-address=:9465

  • Prometheus alert file:

  • Logs:

@maxwo
Copy link
Owner

maxwo commented Sep 29, 2024

Thanks for your detailed message.

If after your modification of the parser, it worked as you expected, I propose you to use the .DeclaredAlerts variable in your template, which includes all the alerts, firing or not.

@maxwo maxwo closed this as completed Sep 29, 2024
@fabienmagagnosc
Copy link
Author

Hi there,

I'm looking at the declaredAlerts, as your code is more important than mine, and I'm still not having any result.
is there a way to have all the information no matter if it's firing or resolving ?

right now, you code is clear :

alert_parser.go :

            _alertGroups[key].DeclaredAlerts = append(alertGroups[key].DeclaredAlerts, alert)
	if alert.Status == "firing" {
		err = alertParser.addAlertToGroup(alertGroups[key], alert)
		if err != nil {
			return nil, err
		}
	}_

only the firing alert got parser and completed with the labels, which an be used to passed into the SNMP alerts (via new OID)

@maxwo maxwo reopened this Oct 10, 2024
@maxwo
Copy link
Owner

maxwo commented Oct 10, 2024

I'm gonna do some checks, as the default template seems to work well:

{{ len .Alerts }}/{{ len .DeclaredAlerts }} alerts are firing:

And it always display the "2/4 alerts are firing" for instance.

How about something like:

{{- range .DeclaredAlerts }}
{{- .Labels.severity }};{{ .Status }}{{ .Labels.instance }};{{ .Labels.job }};{{ .Labels.alertname }};{{ .Annotations.summary }};{{ .Annotations.description }}
{{ end }}

?

@fabienmagagnosc
Copy link
Author

so sorry for the delay. I have been busy with others tasks.

basically, I can provide explanations only for most of the snmp system ,but not all.

you prefer to have 2 snmp alarms :

  • firing the alarm
  • resolving the alarm

the mapping is mostly based on different OID and/or fields to provide the matching.
in the same way as the Prometheus alert manager over the alarms (nothing new)

so, when actually alarm are send, you need to have a "constance" in the alarm format, to allow the third party SNMP system to recognize them.

and example :
_ OID : xxx
status : firing
severity : WARN
server: server01
alarm: CPU over 80% - server01
job: node-exporter-job
_ OID : xxx
status : resolved
severity : WARN
server: server01
alarm: CPU over 80% - server01
job: node-exporter-job

the SNMP system can do the mapping and cancel the alarm.

I'm working on doing more sample now and I'll send asap some samples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants