Skip to content

Commit

Permalink
added alerts using prometheus section
Browse files Browse the repository at this point in the history
  • Loading branch information
brice committed Jul 17, 2018
1 parent 51501b5 commit 3fb8935
Show file tree
Hide file tree
Showing 4 changed files with 45 additions and 2 deletions.
5 changes: 5 additions & 0 deletions day_two_guide/environment_health_checks.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ in this section to diagnose any problems.

include::day_two_guide/topics/complete_deployment_health_check.adoc[leveloffset=+2]

[[day-two-guide-creating-alerts-using-prometheus]]
== Creating alerts using Prometheus

include::day_two_guide/topics/alerts_using_prometheus.adoc[leveloffset=+2]

[[day-two-guide-host-health]]
== Host health

Expand Down
32 changes: 32 additions & 0 deletions day_two_guide/topics/alerts_using_prometheus.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
////
Creating alerts using Prometheus

Module included in the following assemblies:

* day_two_guide/environment_health_checks.adoc
////
You can integrate {product-title} with Prometheus to create visuals and alerts
to help diagnose any environment issues before they arise. These issues can
include if a node goes down, if a pod is consuming too much CPU or memory, and
more.

See the
xref:../install_config/cluster_metrics.adoc#openshift-prometheus[Prometheus on
OpenShift Container Platform section in the Installation and configuration
guide] for more information.

[IMPORTANT]
====
Prometheus on {product-title} is a Technology Preview feature only.
ifdef::openshift-enterprise[]
Technology Preview features are not supported with Red Hat production service
level agreements (SLAs), might not be functionally complete, and Red Hat does
not recommend to use them for production. These features provide early access to
upcoming product features, enabling customers to test functionality and provide
feedback during the development process.
For more information on Red Hat Technology Preview features support scope, see
https://access.redhat.com/support/offerings/techpreview/.
endif::[]
====
1 change: 1 addition & 0 deletions dev_guide/persistent_volumes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -229,3 +229,4 @@ When a PV has its `claimRef` set to some PVC name and namespace, and is
reclaimed according to a `Retain` or `Recycle` reclaim policy, its `claimRef`
will remain set to the same PVC name and namespace even if the PVC or the whole
namespace no longer exists.

9 changes: 7 additions & 2 deletions install_config/cluster_metrics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -969,8 +969,10 @@ additional rules variable:
openshift_prometheus_additional_rules_file: <PATH>
----

The file content should be in Prometheus Alert rules format. The following
example sets a rule to send an alert when one of the cluster nodes is down:
The file must follow
link:https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/[the
Prometheus Alert rules format]. The following example sets a rule to send an
alert when one of the cluster nodes is down:

----
groups:
Expand All @@ -979,11 +981,14 @@ groups:
rules:
- alert: Node Down
expr: up{job="kubernetes-nodes"} == 0
for: 10m <1>
annotations:
miqTarget: "ContainerNode"
severity: "HIGH"
message: "{{ '{{' }}{{ '$labels.instance' }}{{ '}}' }} is down"
----
<1> The optional `for` value specifies the amount of time Prometheus waits before it
sends an alert for this element. For example, if you set `10m`, Prometheus waits 10 minutes after it encounters this issue before sending an alert.

*Prometheus Variables to Control Resource Limits*

Expand Down

0 comments on commit 3fb8935

Please sign in to comment.