Skip to content

Commit

Permalink
prometheus: record and alert rules (KusionStack#168)
Browse files Browse the repository at this point in the history
  • Loading branch information
howieyuen authored Oct 28, 2022
1 parent c51f9f4 commit f059972
Show file tree
Hide file tree
Showing 9 changed files with 332 additions and 17 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,21 @@
sidebar_position: 1
---

# Getting Started
# Recording and Alerting

The Prometheus Operator’s goal is to make running Prometheus on top of Kubernetes as easy as possible while preserving Kubernetes-native configuration options.
The Prometheus Operator provides Kubernetes native deployment and management of Prometheus and related monitoring components. The purpose of this project is to simplify and automate the configuration of a Prometheus-based monitoring stack for Kubernetes clusters.

This guide will show you how to set up an Alertmanager cluster integrating with a Prometheus instance.
The Prometheus operator includes, but is not limited to, the following features:

- Kubernetes Custom Resources: Use Kubernetes custom resources to deploy and manage Prometheus, Alertmanager, and related components.
- Simplified Deployment Configuration: Configure the fundamentals of Prometheus like versions, persistence, retention policies, and replicas from a native Kubernetes resource.
- Prometheus Target Configuration: Automatically generate monitoring target configurations based on familiar Kubernetes label queries; no need to learn a Prometheus-specific configuration language.

The following is the architecture diagram of the Prometheus Operator:

![](/img/docs/user_docs/guides/prometheus/structure.png)

This guide will show you how to set up an Alertmanager cluster integrating with a Prometheus instance based on Prometheus Operator, and use PromethuesRules to record metrics and push alerts.

## Prerequisites

Expand Down Expand Up @@ -37,20 +47,20 @@ kubectl create -f bundle.yaml

For more details, please check [Prometheus Operator Quickstart](https://github.com/prometheus-operator/prometheus-operator#quickstart).

## Full Configuration
## Setup

There is a project named `prometheus-install` in Konfig mono repo, which contains the full configuration of setting up Prometheus and Alertmanager:

- an Alertmanager cluster
- an AlertmanagerConfig object
- an Alertmanager Service
- a Prometheus cluster
- Required RBAC
- required RBAC
- a Prometheus Service

If you can't wait to experience one-click deployment, please jump to the [One-click Deployment](#one-click-deployment) section.

### Configure Alertmanager
### Setup Alertmanager

By default, the Alertmanager instances will start with a minimal configuration which isn’t useful since it doesn’t send any notification when receiving alerts.

Expand Down Expand Up @@ -151,7 +161,7 @@ For complete configuration, please check source code file: [`prometheus-install/

This Alertmanager cluster is now fully functional and highly available, but no alerts are fired against it. Because you have not set up Prometheus yet.

### Configure Prometheus
### Setup Prometheus

Before you set up Prometheus, you must first create the RBAC rules for the Prometheus service account beforehand.

Expand Down Expand Up @@ -263,10 +273,10 @@ Prometheus Admin API allows access to delete series for a certain time range, cl
More information about the admin API can be found in [Prometheus official documentation](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-admin-apis).

:::tip
For complete congfiugration, please check source code file: [`prometheus-install/prod/main.k`](https://github.com/KusionStack/konfig/blob/main/base/examples/monitoring/prometheus-install/prod/main.k).
For complete configuration, please check source code file: [`prometheus-install/prod/main.k`](https://github.com/KusionStack/konfig/blob/main/base/examples/monitoring/prometheus-install/prod/main.k).
:::

## One-click Deployment
### One-click Deployment

Now you can deploy them with one click. Firstly, enter the stack dir of project `prometheus-install` in the konfig repo:

Expand Down Expand Up @@ -310,3 +320,150 @@ kubectl port-forward svc/prometheus-example 30900:9090
Now, you can open the Prometheus web interface, [http://127.0.0.1:30900](http://127.0.0.1:30900/), and go to the "Status > Runtime & Build Information" page and check that Prometheus has discovered 3 Alertmanager instances.

![](/img/docs/user_docs/guides/prometheus/alertmanager.jpg)

## PrometheusRule

The PrometheusRule custom resource definition (CRD) declaratively defines desired Prometheus rules to be consumed by Prometheus instances, including alerting and recording rules. These rules are reconciled by the Operator and dynamically loaded without requiring any restart of Prometheus Rules.

### Recording Rules

Recording rules allow you to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series. Querying the precomputed result will then often be much faster than executing the original expression every time it is needed. This is especially useful for dashboards, which need to query the same expression repeatedly every time they refresh.

The following code snippet takes the node information as an example to the recording rules:

```py
_sum_of_node_memory = """\
sum(
node_memory_MemAvailable_bytes{job="node-exporter"} or
(
node_memory_Buffers_bytes{job="node-exporter"} +
node_memory_Cached_bytes{job="node-exporter"} +
node_memory_MemFree_bytes{job="node-exporter"} +
node_memory_Slab_bytes{job="node-exporter"}
)
) by (cluster)
"""

_node_cpu = """\
sum(rate(node_cpu_seconds_total{job="node-exporter",mode!="idle",mode!="iowait",mode!="steal"}[5m])) /
count(sum(node_cpu_seconds_total{job="node-exporter"}) by (cluster, instance, cpu))
"""
```

`_sum_of_node_memory` records the sum of node available memory in bytes.

`_node_cpu` calculates the average rate of increase of node CPU every 5 minutes.

:::tip
For complete configuration, please check source code file: [`prometheus-rules/record/main.k`](https://github.com/KusionStack/konfig/blob/main/base/examples/monitoring/prometheus-rules/record/main.k).
:::

Now, you can create the recording rule above.

1、Enter the `record` directory of project `prometheus-rules`:

```bash
cd konfig/base/examples/monitoring/prometheus-rules/record
```

2、Apply these rules:

```bash
kusion apply --yes
```

3、Check the Prometheus instance has loaded these rules:

```bash
kubectl port-forward svc/prometheus-example 30900:9090
```

Now, you can open the Prometheus web interface, [http://127.0.0.1:30900](http://127.0.0.1:30900/), and go to the "Status > Rules" page and check that Prometheus has loaded `node.rules`:

![](/img/docs/user_docs/guides/prometheus/node-rules.jpg)

#### Further Reading

If you want to see the generating line graph from the [Recording Rules](#recording-rules) section, you need to deploy a `node-exporter` server in the default namespace.

:::info
How to install node-exporter? Please check here: [`node-exporter.yaml`](https://github.com/KusionStack/examples/blob/main/prometheus/node-exporter.yaml)
:::

Then, you will see, the sum of node memory in bytes:

![](/img/docs/user_docs/guides/prometheus/node-memory.jpg)

and the average rate of increase of node CPU every 5 minutes:

![](/img/docs/user_docs/guides/prometheus/node-cpu.jpg)

### Alerting Rules

Alerting rules allow you to define alert conditions based on Prometheus expression language expressions and to send notifications about firing alerts to an external service. Whenever the alert expression results in one or more vector elements at a given point in time, the alert counts as active for these elements' label sets.

The following code snippet is an example of alerting rules:

```py
_alerts: monitoringv1.PrometheusRule {
metadata = {
name = "example-alert"
namespace = "default"
labels: {
"prometheus": "example",
"role": "alert-rules",
}
}
spec = {
groups = [
{
name = "alert.rules"
rules = [
{
alert: "ExampleAlert"
# vector(s scalar) returns the scalar s as a vector with no labels.
expr: "vector(1)"
}
]
}
]
}
}
```

Using internal function `vector(1)` will always return a vector 1, which means always triggering an alert.

:::tip
For complete configuration, please check source code file: [`prometheus-rules/alert/main.k`](https://github.com/KusionStack/konfig/blob/main/base/examples/monitoring/prometheus-rules/alert/main.k).
:::

Now, you can apply the alerting rules:

1、Enter the stack `alert` of project `prometheus-rules`:

```bash
cd konfig/base/examples/monitoring/prometheus-rules/alert
```

2、Apply these rules:

```bash
kusion apply --yes
```

3、Check the Prometheus instance has loaded these rules:

Since you have already done the port forward step, you just need to refresh the "Status > Rules" page and check that Prometheus has loaded `alert.rules`:

![](/img/docs/user_docs/guides/prometheus/alert-rules.jpg)

4、Check the Alertmanager has received the alert successfully:

```bash
kubectl port-forward svc/alertmanager-example 30903:9093
```

Now, you can open the Alertmanager web interface, [http://127.0.0.1:30903](http://127.0.0.1:30903/) and see the example alert:

![](/img/docs/user_docs/guides/prometheus/alert.jpg)

Loading

0 comments on commit f059972

Please sign in to comment.