Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added "Alerts using prometheus" #10199

Merged
merged 1 commit into from
Jul 17, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions day_two_guide/environment_health_checks.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ in this section to diagnose any problems.

include::day_two_guide/topics/complete_deployment_health_check.adoc[leveloffset=+2]

[[day-two-guide-creating-alerts-using-prometheus]]
== Creating alerts using Prometheus

include::day_two_guide/topics/alerts_using_prometheus.adoc[leveloffset=+2]

[[day-two-guide-host-health]]
== Host health

Expand Down
32 changes: 32 additions & 0 deletions day_two_guide/topics/alerts_using_prometheus.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
////
Creating alerts using Prometheus

Module included in the following assemblies:

* day_two_guide/environment_health_checks.adoc
////

You can integrate {product-title} with Prometheus to create visuals and alerts
to help diagnose any environment issues before they arise. These issues can
include if a node goes down, if a pod is consuming too much CPU or memory, and
more.

See the
xref:../install_config/cluster_metrics.adoc#openshift-prometheus[Prometheus on
OpenShift Container Platform section in the Installation and configuration
guide] for more information.

[IMPORTANT]
====
Prometheus on {product-title} is a Technology Preview feature only.
ifdef::openshift-enterprise[]
Technology Preview features are not supported with Red Hat production service
level agreements (SLAs), might not be functionally complete, and Red Hat does
not recommend to use them for production. These features provide early access to
upcoming product features, enabling customers to test functionality and provide
feedback during the development process.

For more information on Red Hat Technology Preview features support scope, see
https://access.redhat.com/support/offerings/techpreview/.
endif::[]
====
1 change: 1 addition & 0 deletions dev_guide/persistent_volumes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -229,3 +229,4 @@ When a PV has its `claimRef` set to some PVC name and namespace, and is
reclaimed according to a `Retain` or `Recycle` reclaim policy, its `claimRef`
will remain set to the same PVC name and namespace even if the PVC or the whole
namespace no longer exists.

9 changes: 7 additions & 2 deletions install_config/cluster_metrics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -969,8 +969,10 @@ additional rules variable:
openshift_prometheus_additional_rules_file: <PATH>
----

The file content should be in Prometheus Alert rules format. The following
example sets a rule to send an alert when one of the cluster nodes is down:
The file must follow
link:https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/[the
Prometheus Alert rules format]. The following example sets a rule to send an
alert when one of the cluster nodes is down:

----
groups:
Expand All @@ -979,11 +981,14 @@ groups:
rules:
- alert: Node Down
expr: up{job="kubernetes-nodes"} == 0
for: 10m <1>
annotations:
miqTarget: "ContainerNode"
severity: "HIGH"
message: "{{ '{{' }}{{ '$labels.instance' }}{{ '}}' }} is down"
----
<1> The optional `for` value specifies the amount of time Prometheus waits before it
sends an alert for this element. For example, if you set `10m`, Prometheus waits 10 minutes after it encounters this issue before sending an alert.

*Prometheus Variables to Control Resource Limits*

Expand Down