Skip to content

Commit

Permalink
add job restarts limit to values (#54)
Browse files Browse the repository at this point in the history
  • Loading branch information
Zedive authored Sep 16, 2021
1 parent 8359558 commit c378f19
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 6 deletions.
2 changes: 1 addition & 1 deletion charts/flink-job/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: v2
appVersion: "1.0"
description: Flink job cluster on k8s
name: flink-job
version: 0.0.4
version: 0.0.5
maintainers:
- name: Zedive
email: albert@nextdoor.com
7 changes: 4 additions & 3 deletions charts/flink-job/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Flink job cluster on k8s

![Version: 0.0.4](https://img.shields.io/badge/Version-0.0.4-informational?style=flat-square) ![AppVersion: 1.0](https://img.shields.io/badge/AppVersion-1.0-informational?style=flat-square)
![Version: 0.0.5](https://img.shields.io/badge/Version-0.0.5-informational?style=flat-square) ![AppVersion: 1.0](https://img.shields.io/badge/AppVersion-1.0-informational?style=flat-square)

This chart deploys a flink job cluster and runs a simple word counting flink app as an example.
This chart includes some production ready set-ups such as
Expand All @@ -19,8 +19,9 @@ See metrics reporter in the flink properties for more details.

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| alerts.enabled | bool | `true` | (Boolean) whether to create the PrometheusRule for this flink cluster |
| alerts.severity | string | `"info"` | |
| alerts.enabled | bool | `true` | (Boolean) Specifies whether to create the PrometheusRule for this flink cluster |
| alerts.restartsLimit | int | `2` | (`int`) The number of job restarts before alerting |
| alerts.severity | string | `"info"` | (String) Severity of the alerts |
| defaults.runbookUrl | string | `"https://github.com/Nextdoor/k8s-charts/blob/main/charts/flink-job/runbook.md"` | (String) Runbook URL for the Prometheus alerts |
| envVars | list | `[{"name":"HADOOP_CLASSPATH","value":"/opt/flink/opt/flink-metrics-prometheus-1.9.3.jar"}]` | Environment variables shared by all containers |
| flinkProperties | object | `{"execution.checkpointing.interval":"10min","execution.checkpointing.mode":"EXACTLY_ONCE","high-availability":"org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory","high-availability.storageDir":"file:/savepoint/","kubernetes.cluster-id":"{{ .Values.fullnameOverride }}","kubernetes.namespace":"{{ .Release.Namespace }}","metrics.reporter.prom.class":"org.apache.flink.metrics.prometheus.PrometheusReporter","metrics.reporters":"prom","restart-strategy":"exponential-delay","restart-strategy.exponential-delay.backoff-multiplier":"2.0","state.checkpoints.dir":"file:/savepoint/","taskmanager.numberOfTaskSlots":"1"}` | (`Map`) Flink properties which are appened to flink-conf.yaml |
Expand Down
2 changes: 1 addition & 1 deletion charts/flink-job/templates/prometheusrule.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ spec:
changes(flink_jobmanager_job_numRestarts{
cluster="{{ $cluster }}",
namespace="{{ $namespace }}"
}[30m]) > 2
}[30m]) > {{ .Values.alerts.restartsLimit }}
for: 10m
labels:
severity: {{ .Values.alerts.severity }}
Expand Down
5 changes: 4 additions & 1 deletion charts/flink-job/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -199,9 +199,12 @@ savepoints:
enabled: true

alerts:
# -- (Boolean) whether to create the PrometheusRule for this flink cluster
# -- (Boolean) Specifies whether to create the PrometheusRule for this flink cluster
enabled: true
# -- (String) Severity of the alerts
severity: info
# -- (`int`) The number of job restarts before alerting
restartsLimit: 2

defaults:
# -- (String) Runbook URL for the Prometheus alerts
Expand Down

0 comments on commit c378f19

Please sign in to comment.