Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tickscripts for Telegraf #780

Merged
merged 1 commit into from
Aug 8, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Add tickscipts for telegraf
  • Loading branch information
jackzampolin committed Aug 8, 2016
commit 1b4d0b261d7ab07fcc8a93c2ce843637ead28222
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ kapacitord config
# Getting Started

This README gives you a high level overview of what Kapacitor is and what its like to use it. As well as some details of how it works.
To get started using Kapacitor see [this guide](https://docs.influxdata.com/kapacitor/latest/introduction/getting_started/).
To get started using Kapacitor see [this guide](https://docs.influxdata.com/kapacitor/latest/introduction/getting_started/). After you finish the getting started exercise you can check out the [TICKscripts](https://github.com/influxdata/kapacitor/tree/master/examples/telegraf) for different Telegraf plugins.

# Basic Example

Expand Down Expand Up @@ -79,4 +79,4 @@ kapacitor define \
kapacitor enable cpu_alert
```

For more complete examples see the [documentation](https://docs.influxdata.com/kapacitor/latest/examples/)
For more complete examples see the [documentation](https://docs.influxdata.com/kapacitor/latest/examples/).
127 changes: 127 additions & 0 deletions examples/telegraf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# `telegraf`

`telegraf` contains tickscripts for individual [telegraf](github.com/influxdata/telegraf) plugins. The directories at the root of the repo are named for the telegraf plugin they reference. There is a common structure for each of the tickscripts so as to be consistent and easily understandable for beginners. Each of the scripts are broken up into a few different sections:

* Comments
- Name of the alert
- Name of field to alert on and other fields available from the telegraf plugin
- Full telegraf configuration for the plugin the tickscript references including comments
- Commands for `define`-ing and `enable`-ing the script from the root of the repo
* Parameters
- A list of variables to make customization easy
* Dataframe
- A definition of the data to alert on
* Thresholds
- Define the thresholds or expressions to alert on
* Alert
- Where to send the alert. All scripts `.log()` by default.
- The tickscripts are written to make it easy to swap in whatever alert output you need. Just change the `alert.log('/tmp/{alert_name}.txt')` line in each tickscipt to your desired alert output. A full listing of outputs with code samples is available in the [kapacitor documentation](https://docs.influxdata.com/kapacitor/v0.13/nodes/alert_node/).

> **On Alert Volume:** These alerts may be very noisy or quiet depending on your environment. They are meant to be starting points for alerts with all the knobs easily adjustable from the Parameters. Many users will also want to eliminate the `.info()` level of logging. It is included here for completeness.

> **On Verbosity:** These scripts are meant as templates for users who are new to writing tickscripts. All of the examples here can be written as one large stream. See the [documentation](https://docs.influxdata.com/kapacitor/v0.13/) for examples and full tick syntax.

### Batch script example

```javascript
// {alert_name}

// metric: {alert_metric}
// available_fields: [[other_telegraf_fields]]

// TELEGRAF CONFIGURATION
// [inputs.{plugin}]
// # Full configuration

// DEFINE: kapacitor define {alert_name} -type batch -tick {plugin}/{alert_name}.tick -dbrp telegraf.autogen
// ENABLE: kapacitor enable {alert_name}

// Parameters
var info = {info_level}
var warn = {warn_level}
var crit = {crit_level}
var infoSig = 2.5
var warnSig = 3
var critSig = 3.5
var period = 10s
var every = 10s

// Dataframe
var data = batch
|query('''{InfluxQL_Query}''')
.period(period)
.every(every)
.groupBy('host')

// Thresholds
var alert = data
|eval(lambda: sigma("stat"))
.as('sigma')
.keep()
|alert()
.id('{{ index .Tags "host"}}/{alert_metric}')
.message('{{ .ID }}:{{ index .Fields "stat" }}')
.info(lambda: "stat" > info OR "sigma" > infoSig)
.warn(lambda: "stat" > warn OR "sigma" > warnSig)
.crit(lambda: "stat" > crit OR "sigma" > critSig)

// Alert
alert
.log('/tmp/{alert_name}_log.txt')

```

### Stream script example

```javascript
// {alert_name}

// metric: {alert_metric}
// available_fields: [[other_telegraf_fields]]

// TELEGRAF CONFIGURATION
// [inputs.{plugin}]
// # full configuration

// DEFINE: kapacitor define {alert_name} -type batch -tick {plugin}/{alert_name}.tick -dbrp telegraf.autogen
// ENABLE: kapacitor enable {alert_name}

// Parameters
var info = {info_level}
var warn = {warn_level}
var crit = {crit_level}
var infoSig = 2.5
var warnSig = 3
var critSig = 3.5
var period = 10s
var every = 10s

// Dataframe
var data = stream
|from()
.database('telegraf')
.retentionPolicy('autogen')
.measurement({plugin})
.groupBy('host')
|window()
.period(period)
.every(every)
|mean({alert_metric})
.as("stat")

// Thresholds
var alert = data
|eval(lambda: sigma("stat"))
.as('sigma')
.keep()
|alert()
.id('{{ index .Tags "host"}}/{alert_metric}')
.message('{{ .ID }}:{{ index .Fields "stat" }}')
.info(lambda: "stat" > info OR "sigma" > infoSig)
.warn(lambda: "stat" > warn OR "sigma" > warnSig)
.crit(lambda: "stat" > crit OR "sigma" > critSig)

// Alert
alert
.log('/tmp/{alert_name}_log.txt')
```
46 changes: 46 additions & 0 deletions examples/telegraf/cpu/cpu_alert_batch.tick
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
// cpu_alert_batch

// metric: usage_user
// available_fields: "usage_guest","usage_guest_nice","usage_idle","usage_iowait", "usage_irq","usage_nice","usage_softirq","usage_steal","usage_system"

// TELEGRAF CONFIGURATION
// [[inputs.cpu]]
// percpu = true
// totalcpu = true
// fielddrop = ["time_*"]

// DEFINE: kapacitor define cpu_alert_batch -type batch -tick cpu/cpu_alert_batch.tick -dbrp telegraf.autogen
// ENABLE: kapacitor enable cpu_alert_batch

// Parameters
var info = 70
var warn = 80
var crit = 90
var infoSig = 2.5
var warnSig = 3
var critSig = 3.5
var period = 10s
var every = 10s

// Dataframe
var data = batch
|query('''SELECT mean(usage_user) AS stat FROM "telegraf"."autogen"."cpu" WHERE cpu = 'cpu-total' ''')
.period(period)
.every(every)
.groupBy('host')

// Thresholds
var alert = data
|eval(lambda: sigma("stat"))
.as('sigma')
.keep()
|alert()
.id('{{ index .Tags "host"}}/cpu_used')
.message('{{ .ID }}:{{ index .Fields "stat" }}')
.info(lambda: "stat" > info OR "sigma" > infoSig)
.warn(lambda: "stat" > warn OR "sigma" > warnSig)
.crit(lambda: "stat" > crit OR "sigma" > critSig)

// Alert
alert
.log('/tmp/cpu_alert_log.txt')
53 changes: 53 additions & 0 deletions examples/telegraf/cpu/cpu_alert_stream.tick
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
// cpu_alert_stream

// metric: usage_user
// available_fields: "usage_guest","usage_guest_nice","usage_idle","usage_iowait", "usage_irq","usage_nice","usage_softirq","usage_steal","usage_system"

// TELEGRAF CONFIGURATION
// [[inputs.cpu]]
// percpu = true
// totalcpu = true
// fielddrop = ["time_*"]

// DEFINE: kapacitor define cpu_alert_stream -type stream -tick cpu/cpu_alert_stream.tick -dbrp telegraf.autogen
// ENABLE: kapacitor enable cpu_alert_stream

// Parameters
var info = 70
var warn = 80
var crit = 90
var infoSig = 2.5
var warnSig = 3
var critSig = 3.5
var period = 10s
var every = 10s

// Dataframe
var data = stream
|from()
.database('telegraf')
.retentionPolicy('autogen')
.measurement('cpu')
.groupBy('host')
.where(lambda: "cpu" == 'cpu-total')
|window()
.period(period)
.every(every)
|mean('usage_user')
.as('stat')

// Thresholds
var alert = data
|eval(lambda: sigma("stat"))
.as('sigma')
.keep()
|alert()
.id('{{ index .Tags "host"}}/cpu_used')
.message('{{ .ID }}:{{ index .Fields "stat" }}')
.info(lambda: "stat" > info OR "sigma" > infoSig)
.warn(lambda: "stat" > warn OR "sigma" > warnSig)
.crit(lambda: "stat" > crit OR "sigma" > critSig)

// Alert
alert
.log('/tmp/cpu_alert_log.txt')
38 changes: 38 additions & 0 deletions examples/telegraf/disk/disk_alert_batch.tick
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// disk_alert_batch

// metric: used_percent
// available_fields: "free","inodes_free","inodes_total","inodes_used","total","used"

// TELEGRAF CONFIGURATION
// [[inputs.disk]]
// ignore_fs = ["tmpfs", "devtmpfs"]

// DEFINE: kapacitor define disk_alert_batch -type batch -tick disk/disk_alert_batch.tick -dbrp telegraf.autogen
// ENABLE: kapacitor enable disk_alert_batch

// Parameters
var info = 75
var warn = 85
var crit = 92
var period = 10s
var every = 10s

// Dataframe
var data = batch
|query('''SELECT mean(used_percent) AS stat FROM "telegraf"."autogen"."disk" WHERE path = '/' ''')
.period(period)
.every(every)
.groupBy('host')

// Thresholds
var alert = data
|alert()
.id('{{ index .Tags "host"}}/disk_used')
.message('{{ .ID }}:{{ index .Fields "stat" }}')
.info(lambda: "stat" > info)
.warn(lambda: "stat" > warn)
.crit(lambda: "stat" > crit)

// Alert
alert
.log('/tmp/disk_alert_log.txt')
44 changes: 44 additions & 0 deletions examples/telegraf/disk/disk_alert_stream.tick
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// disk_alert_stream

// metric: used_percent
// available_fields: free","inodes_free","inodes_total","inodes_used","total","used"

// TELEGRAF CONFIGURATION
// [[inputs.disk]]
// ignore_fs = ["tmpfs", "devtmpfs"]

// DEFINE: kapacitor define disk_alert_stream -type stream -tick disk/disk_alert_stream.tick -dbrp telegraf.autogen
// ENABLE: kapacitor enable disk_alert_stream

// Parameters
var info = 75
var warn = 85
var crit = 92
var period = 10s
var every = 10s

// Dataframe
var data = stream
|from()
.database('telegraf')
.retentionPolicy('autogen')
.measurement('mem')
.groupBy('host')
|window()
.period(period)
.every(every)
|mean('used_percent')
.as('stat')

// Thresholds
var alert = data
|alert()
.id('{{ index .Tags "host"}}/disk_used')
.message('{{ .ID }}:{{ index .Fields "stat" }}')
.info(lambda: "stat" > info)
.warn(lambda: "stat" > warn)
.crit(lambda: "stat" > crit)

// Alert
alert
.log('/tmp/mem_alert_log.txt')
44 changes: 44 additions & 0 deletions examples/telegraf/generic_batch_example.tick
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// {alert_name}

// metric: {alert_metric}
// available_fields: [[other_telegraf_fields]]

// TELEGRAF CONFIGURATION
// [inputs.{plugin}]
// # Full configuration

// DEFINE: kapacitor define {alert_name} -type batch -tick {plugin}/{alert_name}.tick -dbrp telegraf.autogen
// ENABLE: kapacitor enable {alert_name}

// Parameters
var info = {info_level}
var warn = {warn_level}
var crit = {crit_level}
var infoSig = 2.5
var warnSig = 3
var critSig = 3.5
var period = 10s
var every = 10s

// Dataframe
var data = batch
|query('''{InfluxQL_Query}''')
.period(period)
.every(every)
.groupBy('host')

// Thresholds
var alert = data
|eval(lambda: sigma("stat"))
.as('sigma')
.keep()
|alert()
.id('{{ index .Tags "host"}}/{alert_metric}')
.message('{{ .ID }}:{{ index .Fields "stat" }}')
.info(lambda: "stat" > info OR "sigma" > infoSig)
.warn(lambda: "stat" > warn OR "sigma" > warnSig)
.crit(lambda: "stat" > crit OR "sigma" > critSig)

// Alert
alert
.log('/tmp/{alert_name}_log.txt')
Loading