-
Notifications
You must be signed in to change notification settings - Fork 13.1k
Description
What happened?
I have Grafana rules enabled with unified Alerting and configured external prometheus AlertManager as contact point.
Every notification to sen alerts makes a new connection to AlertManager. After 3-4 days I have 10k+ connections to AlertManager with state "ESTABLISHED". The OS has problems with open files.
Also, Grafana systems service has settings of max limit files. LimitNOFILE=10000
my system has below status:
# netstat -lntpa | grep 9093 | wc -l
**19979**
Due to the open files(active connections) crossed the limit, subsequently connections to AlertManager failed. and Grafana tried multiple attempt to send notification and failed. Eventually Grafana went down and alert manager was falling behind.
Grafana logs:
logger=ngalert.notifier.prometheus-alertmanager t=2024-03-20T12:21:37.758679171-05:00 level=warn msg="failed to send to Alertmanager" error="Post \"http://admin:9093/api/v1/alerts\": dial tcp 172.23.0.1:9093: socket: too many open files" alertmanager=cp_1 url=http://admin:9093/api/v1/alerts
logger=ngalert.notifier.prometheus-alertmanager t=2024-03-20T12:21:37.758747299-05:00 level=warn msg="all attempts to send to Alertmanager failed" alertmanager=cp_1
logger=alertmanager org=1 t=2024-03-20T12:21:37.758796361-05:00 level=error component=alertmanager orgID=1 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="cp_1/prometheus-alertmanager[0]: notify retry canceled due to unrecoverable error after 1 attempts: failed to send alert to Alertmanager: Post \"http://admin:9093/api/v1/alerts\": dial tcp 172.23.0.1:9093: socket: too many open files"
logger=provisioning.dashboard type=file name=Dashboards t=2024-03-20T12:21:44.865037298-05:00 level=error msg="failed to search for dashboards" error="open /var/lib/grafana/dashboards/cray-EX: too many open files"
logger=provisioning.dashboard type=file name=Dashboards t=2024-03-20T12:21:54.86600488-05:00 level=error msg="failed to search for dashboards" error="open /var/lib/grafana/dashboards/cray-EX: too many open files"
What did you expect to happen?
usually connections to alertmanager from Grafana should get closed after a while but grafana 9.x never close connections, they just open every time it is supposed to send alerts notification.
I believe Grafana 7.x creates one connection and uses that to notify alerts with legacy alerting framework.
Did this work before?
I am not sure about this.
How do we reproduce it?
- Add external prometheus alert manager in datasource.yml or UI with settings handleGrafanaManagedAlerts
- access: proxy
jsonData:
handleGrafanaManagedAlerts: true
implementation: prometheus
name: Alertmanager
type: alertmanager
url: http://admin:9093
- Add Alertmanger contact point
- create sample rules and try to send to alert manager
- or use test notifications to send notification.
Everytime we click send notification, it opens a new communication to AlertManager.
Is the bug inside a dashboard panel?
No
Environment (with versions)?
Grafana: 9.5.5
OS: Linux SLES15sp5
Browser: Chrome, Safari, Firefox (anything)
Grafana platform?
A package manager (APT, YUM, BREW, etc.)
Datasource(s)?
promethues Alertmanager
Metadata
Metadata
Assignees
Labels
Type
Projects
Status