@@ -18,15 +18,15 @@ groups:
1818 4. Monitor resource utilization
1919
2020 - alert : ConsoleDatabrokerReconcilerErrors
21- expr : increase(pomerium_console_databroker_reconciler_Reconcile_calls_failures [5m]) > 0
21+ expr : increase(pomerium_console_databroker_reconciler_Reconcile_failures_total [5m]) > 0
2222 for : 2m
2323 labels :
2424 severity : warning
2525 component : pomerium-console
2626 service : syncer
2727 annotations :
28- summary : " Pomerium Console config reconciler failures for config syncer "
29- description : " {{ $value }} databroker reconciler failures for cluster {{ $labels.cluster }} in the last 5 minutes"
28+ summary : " Pomerium Console config reconciler failures"
29+ description : " {{ $value }} databroker reconciler failures for cluster {{ $labels.cluster }} {{ $labels.component }} in the last 5 minutes"
3030 runbook : |
3131 1. Check the root cause of the failure in the logs or traces by filtering for `pomerium_console_databroker_reconciler` calls
3232 2. If the failure is due to a specific entity, review configuration validation errors to understand which entity is causing the issue
@@ -45,10 +45,28 @@ groups:
4545 1. Check database performance and query times using OTEL Tracing to identify the root cause
4646 2. Review databroker performance and connectivity
4747
48+ - alert : ConsoleDatabrokerReconcilerMissing
49+ expr : |
50+ (
51+ count by (cluster_id) (pomerium_console_databroker_reconciler_ReconcileLoop{component="config-syncer"}) +
52+ count by (cluster_id) (pomerium_console_databroker_reconciler_ReconcileLoop{component="service-account-syncer"})
53+ ) < 2
54+ for : 2m
55+ labels :
56+ severity : critical
57+ component : pomerium-console
58+ service : syncer
59+ annotations :
60+ summary : " Some databroker reconciler components are not running"
61+ description : " Only {{ $value }} out of 2 expected databroker reconciler components are running for cluster {{ $labels.cluster_id }}"
62+ runbook : |
63+ 1. Check console logs for reconciler startup errors
64+ 2. Verify databroker connectivity for the affected cluster
65+
4866 - name : pomerium-console-external-data-sources
4967 rules :
5068 - alert : ExternalDataSourceTaskFailures
51- expr : increase(pomerium_console_datasource_task_calls_failures [5m]) > 0
69+ expr : increase(pomerium_console_datasource_task_calls_failures_total [5m]) > 0
5270 for : 1m
5371 labels :
5472 severity : warning
0 commit comments