Skip to content

Commit d5518bd

Browse files
committed
add reconciler liveness
1 parent 0a638ba commit d5518bd

File tree

1 file changed

+22
-4
lines changed

1 file changed

+22
-4
lines changed

content/examples/prometheus/console-alerts.yml

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,15 @@ groups:
1818
4. Monitor resource utilization
1919
2020
- alert: ConsoleDatabrokerReconcilerErrors
21-
expr: increase(pomerium_console_databroker_reconciler_Reconcile_calls_failures[5m]) > 0
21+
expr: increase(pomerium_console_databroker_reconciler_Reconcile_failures_total[5m]) > 0
2222
for: 2m
2323
labels:
2424
severity: warning
2525
component: pomerium-console
2626
service: syncer
2727
annotations:
28-
summary: "Pomerium Console config reconciler failures for config syncer"
29-
description: "{{ $value }} databroker reconciler failures for cluster {{ $labels.cluster }} in the last 5 minutes"
28+
summary: "Pomerium Console config reconciler failures"
29+
description: "{{ $value }} databroker reconciler failures for cluster {{ $labels.cluster }} {{ $labels.component }} in the last 5 minutes"
3030
runbook: |
3131
1. Check the root cause of the failure in the logs or traces by filtering for `pomerium_console_databroker_reconciler` calls
3232
2. If the failure is due to a specific entity, review configuration validation errors to understand which entity is causing the issue
@@ -45,10 +45,28 @@ groups:
4545
1. Check database performance and query times using OTEL Tracing to identify the root cause
4646
2. Review databroker performance and connectivity
4747
48+
- alert: ConsoleDatabrokerReconcilerMissing
49+
expr: |
50+
(
51+
count by (cluster_id) (pomerium_console_databroker_reconciler_ReconcileLoop{component="config-syncer"}) +
52+
count by (cluster_id) (pomerium_console_databroker_reconciler_ReconcileLoop{component="service-account-syncer"})
53+
) < 2
54+
for: 2m
55+
labels:
56+
severity: critical
57+
component: pomerium-console
58+
service: syncer
59+
annotations:
60+
summary: "Some databroker reconciler components are not running"
61+
description: "Only {{ $value }} out of 2 expected databroker reconciler components are running for cluster {{ $labels.cluster_id }}"
62+
runbook: |
63+
1. Check console logs for reconciler startup errors
64+
2. Verify databroker connectivity for the affected cluster
65+
4866
- name: pomerium-console-external-data-sources
4967
rules:
5068
- alert: ExternalDataSourceTaskFailures
51-
expr: increase(pomerium_console_datasource_task_calls_failures[5m]) > 0
69+
expr: increase(pomerium_console_datasource_task_calls_failures_total[5m]) > 0
5270
for: 1m
5371
labels:
5472
severity: warning

0 commit comments

Comments
 (0)