forked from gardener/gardener
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add remote alertmanager for operators
- Loading branch information
Showing
26 changed files
with
451 additions
and
88 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 0 additions & 1 deletion
1
charts/gardener/charts/application/templates/secret-alerting-smtp.yaml
This file was deleted.
Oops, something went wrong.
1 change: 1 addition & 0 deletions
1
charts/gardener/charts/application/templates/secret-alerting.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{{- include "gardener.secret-alerting" . }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
26 changes: 0 additions & 26 deletions
26
charts/gardener/charts/utils-common/templates/_secret-alerting-smtp.yaml
This file was deleted.
Oops, something went wrong.
43 changes: 43 additions & 0 deletions
43
charts/gardener/charts/utils-common/templates/_secret-alerting.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
{{- define "gardener.secret-alerting" -}} | ||
{{- if .Values.global.controller.enabled }} | ||
{{- range $key, $config := .Values.global.controller.alerting }} | ||
--- | ||
apiVersion: v1 | ||
kind: Secret | ||
metadata: | ||
name: alerting-{{ $key }} | ||
namespace: garden | ||
labels: | ||
app: gardener | ||
chart: "{{ $.Chart.Name }}-{{ $.Chart.Version }}" | ||
release: "{{ $.Release.Name }}" | ||
heritage: "{{ $.Release.Service }}" | ||
gardener.cloud/role: alerting | ||
type: Opaque | ||
data: | ||
auth_type: {{ ( required ".controller.alerting[].auth_type is required" $config.auth_type ) | b64enc }} | ||
{{- if eq $config.auth_type "smtp" }} | ||
to: {{ ( required ".controller.alerting[].to is required" $config.to ) | b64enc }} | ||
from: {{ ( required ".controller.alerting[].from is required" $config.from ) | b64enc }} | ||
smarthost: {{ ( required ".controller.alerting[].smarthost is required" $config.smarthost ) | b64enc }} | ||
auth_username: {{ ( required ".controller.alerting[].auth_username is required" $config.auth_username ) | b64enc }} | ||
auth_identity: {{ ( required ".controller.alerting[].auth_identity is required" $config.auth_identity ) | b64enc }} | ||
auth_password: {{ ( required ".controller.alerting[].auth_password is required" $config.auth_password ) | b64enc }} | ||
{{- end }} | ||
{{- if eq $config.auth_type "none" }} | ||
url: {{ ( required ".controller.alerting[].url is required" $config.url ) | b64enc }} | ||
{{- end }} | ||
{{- if eq $config.auth_type "basic" }} | ||
url: {{ ( required ".controller.alerting[].url is required" $config.url ) | b64enc }} | ||
username: {{ ( required ".controller.alerting[].username is required" $config.username ) | b64enc }} | ||
password: {{ ( required ".controller.alerting[].password is required" $config.password ) | b64enc }} | ||
{{- end }} | ||
{{- if eq $config.auth_type "certificate" }} | ||
url: {{ ( required ".controller.alerting[].url is required" $config.url ) | b64enc }} | ||
ca.crt: {{ ( required ".controller.alerting[].ca_crt is required" $config.ca_crt ) | b64enc }} | ||
tls.crt: {{ ( required ".controller.alerting[].tls_crt is required" $config.tls_cert ) | b64enc }} | ||
tls.key: {{ ( required ".controller.alerting[].tls_key is required" $config.tls_key ) | b64enc }} | ||
{{- end }} | ||
{{- end }} | ||
{{- end }} | ||
{{- end -}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,9 @@ | ||
{{ if .Values.alertmanager.enabled }} | ||
apiVersion: v1 | ||
kind: Secret | ||
metadata: | ||
name: alertmanager-config | ||
namespace: {{ .Release.Namespace }} | ||
data: | ||
alertmanager.yaml: {{ include "config" .Values.alertmanager | b64enc }} | ||
{{- end }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -115,6 +115,7 @@ fluentd-es: | |
|
||
alertmanager: | ||
emailConfigs: [] | ||
enabled: true | ||
storage: 1Gi | ||
|
||
hvpa: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
# Alerting | ||
|
||
Gardener uses [Prometheus](https://prometheus.io/) to gather metrics from each component. A Prometheus is deployed in each shoot control plane (on the seed) which is responsible for gathering control plane and cluster metrics. Prometheus can be configured to fire alerts based on these metrics and send them to an [alertmanager](https://prometheus.io/docs/alerting/alertmanager/). The alertmanager is responsible for sending the alerts to users and operators. This document describes how to setup alerting for: | ||
|
||
- [end-users/stakeholders/customers](#Alerting-for-Users) | ||
- [operators/administrators](#Alerting-for-Operators) | ||
|
||
# Alerting for Users | ||
|
||
To receive email alerts as a user set the following values in the shoot spec: | ||
|
||
```yaml | ||
spec: | ||
monitoring: | ||
alerting: | ||
emailReceivers: | ||
- john.doe@example.com | ||
``` | ||
`emailReceivers` is a list of emails that will receive alerts if something is wrong with the shoot cluster. A list of alerts for users can be found [here](user_alerts.md). | ||
|
||
# Alerting for Operators | ||
|
||
Currently, Gardener supports two options for alerting: | ||
|
||
- [Email Alerting](#Email-Alerting) | ||
- [Sending Alerts to an external alertmanager](#External-Alertmanager) | ||
|
||
A list of operator alerts can be found [here](operator_alerts.md). | ||
|
||
## Email Alerting | ||
|
||
Gardener provides the option to deploy an alertmanager into each seed. This alertmanager is responsible for sending out alerts to operators for each shoot cluster in the seed. Only email alerts are supported by the alertmanager managed by Gardener. This is configurable by setting the Gardener controller manager configuration values `alerting`. See [this](../usage/configuration.md) on how to configure the Gardener's SMTP secret. If the values are set, a secret with the label `gardener.cloud/role: alerting` will be created in the garden namespace of the garden cluster. This secret will be used by each alertmanager in each seed. | ||
|
||
## External Alertmanager | ||
|
||
The alertmanager supports different kinds of [alerting configurations](https://prometheus.io/docs/alerting/configuration/). The alertmanager provided by Gardener only supports email alerts. If email is not sufficient, then alerts can be sent to an external alertmanager. Prometheus will send alerts to a URL and then alerts will be handled by the external alertmanager. This external alertmanager is operated and configured by the operator (i.e. Gardener does not configure or deploy this alertmanager). To configure sending alerts to an external alertmanager, create a secret in the virtual garden cluster in the garden namespace with the label: `gardener.cloud/role: alerting`. This secret needs to contain a URL to the the external alertmanager and information regarding authentication. Supported authentication types are: | ||
|
||
- No Authentication (none) | ||
- Basic Authentication (basic) | ||
- Mutual TLS (certificate) | ||
|
||
### Remote Alertmanager Examples | ||
|
||
Note: the `url` value cannot be prepended with `http` or `https`. | ||
|
||
```yaml | ||
# No Authentication | ||
apiVersion: v1 | ||
kind: Secret | ||
metadata: | ||
labels: | ||
gardener.cloud/role: alerting | ||
name: alerting-auth | ||
namespace: garden | ||
data: | ||
# No Authentication | ||
auth_type: base64(none) | ||
url: base64(external.alertmanager.foo) | ||
# Basic Auth | ||
auth_type: base64(basic) | ||
url: base64(extenal.alertmanager.foo) | ||
username: base64(admin) | ||
password: base64(password) | ||
# Mutual TLS | ||
auth_type: base64(certificate) | ||
url: base64(external.alertmanager.foo) | ||
ca.crt: base64(ca) | ||
tls.crt: base64(certificate) | ||
tls.key: base64(key) | ||
# Email Alerts (internal alertmanager) | ||
auth_type: base64(smtp) | ||
auth_identity: base64(internal.alertmanager.auth_identity) | ||
auth_password: base64(internal.alertmanager.auth_password) | ||
auth_username: base64(internal.alertmanager.auth_username) | ||
from: base64(internal.alertmanager.from) | ||
smarthost: base64(internal.alertmanager.smarthost) | ||
to: base64(internal.alertmanager.to) | ||
type: Opaque | ||
``` | ||
|
||
### Configuring your External Alertmanager | ||
|
||
Please refer to the [alertmanager](https://prometheus.io/docs/alerting/alertmanager/) documentation on how to configure an alertmanager. | ||
|
||
We recommend you use at least the following inhibition rules in your alertmanager configuration to prevent excessive alerts: | ||
```yaml | ||
inhibit_rules: | ||
# Apply inhibition if the alert name is the same. | ||
- source_match: | ||
severity: critical | ||
target_match: | ||
severity: warning | ||
equal: ['alertname', 'service', 'cluster'] | ||
# Stop all alerts for type=shoot if there are VPN problems. | ||
- source_match: | ||
service: vpn | ||
target_match_re: | ||
type: shoot | ||
equal: ['type', 'cluster'] | ||
# Stop warning and critical alerts if there is a blocker - no workers nodes, no etcd main etc. | ||
- source_match: | ||
severity: blocker | ||
target_match_re: | ||
severity: ^(critical|warning)$ | ||
equal: ['cluster'] | ||
# If the API server is down inhibit no worker nodes alert. No worker nodes depends on kube-state-metrics which depends on the API server. | ||
- source_match: | ||
service: kube-apiserver | ||
target_match_re: | ||
service: nodes | ||
equal: ['cluster'] | ||
# If API server is down inhibit kube-state-metrics alerts. | ||
- source_match: | ||
service: kube-apiserver | ||
target_match_re: | ||
severity: info | ||
equal: ['cluster'] | ||
# No Worker nodes depends on kube-state-metrics. Inhibit no worker nodes if kube-state-metrics is down. | ||
- source_match: | ||
service: kube-state-metrics-shoot | ||
target_match_re: | ||
service: nodes | ||
equal: ['cluster'] | ||
``` | ||
Below is a graph visualizing the inhibition rules: | ||
|
||
![inhibitionGraph](../development/content/alertInhibitionGraph.png) | ||
|
||
|
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.