Skip to content
This repository was archived by the owner on Feb 12, 2023. It is now read-only.

bonitoo-io/influxdb-vector-demo

Repository files navigation

InfluxDB + Vector 👌

InfluxDB

InfluxDB is open source time series database, purpose-built by InfluxData for monitoring metrics and events, provides real-time visibility into stacks, sensors, and systems. Use InfluxDB to capture, analyze, and store millions of points per second and much more.

Vector

Vector is a highly reliable observability data router built for demanding production environments. On top of this basic functionality, Vector adds a few important enhancements:

  1. A richer data model, supporting not only logs but aggregated metrics, fully structured events, etc
  2. Programmable transforms written in lua (or eventually wasm) that let you parse, filter, aggregate, and otherwise manipulate your data in arbitrary ways
  3. Uncompromising performance and efficiency that enables a huge variety of deployment strategies

Data Model

The individual pieces of data flowing through Vector are known as events. Events are arbitrarily wide, and deep, structured pieces of data. There are two types of events: log and metric.

Log

A log event is a structured represention of a point-in-time event. It contains an arbitrary set of fields (key/value pairs) that describe the event.

{
  "host": "my.host.com",
  "message": "<13>Feb 13 20:07:26 74794bfb6795 root[8539]: i am foobar",
  "timestamp": "2019-11-01T21:15:47+00:00"
}

Metric

A metric event represents a numerical operation to a time series. Operations offered are heavily inspired by the StatsD and Prometheus models, and determine the schema of the metric structure within Vector.

{
  "name": "login.count",
  "timestamp": "2019-11-01T21:15:47+00:00",
  "kind": "absolute",
  "tags": {
    "host": "my.host.com"
  },
  "counter": {
    "value": 24.2
  }
}

Why Not Just Events?

Existing services usually emit metrics, traces, and logs of varying quality. By designing Vector to meet services where they are (current state), Vector serve as a bridge to newer standards. This is why Vector place "events" at the top of data model, where logs and metrics are derived (traces coming soon).

InfluxDB Logs Sink

official docs

The Vector influxdb_logs sink batches log events to InfluxDB using v1 or v2 HTTP API.

Mapping Log Event into Line Protocol

InfluxDB uses line protocol to write data points. It is a text-based format that provides the measurement, tag set, field set, and timestamp of a data point.

A Log Event event contains an arbitrary set of fields (key/value pairs) that describe the event.

The following matrix outlines how Log Event fields are mapped into InfluxDB Line Protocol:

Field Line Protocol
host tag
message field
source_type tag
timestamp timestamp
[custom-key] field

The default behaviour could be overridden by a tags configuration.

Mapping example

The following example shows how is Log Event mapped into Line Protocol:

Log Event
{
  "host": "my.host.com",
  "message": "<13>Feb 13 20:07:26 74794bfb6795 root[8539]: i am foobar",
  "timestamp": "2019-11-01T21:15:47+00:00",
  "custom_field": "custom_value"
}
Line Protocol
ns.vector,host=my.host.com,metric_type=logs custom_field="custom_value",message="<13>Feb 13 20:07:26 74794bfb6795 root[8539]: i am foobar" 1572642947000000000

Configuration example

InfluxDB v1

[sinks.my_sink_id]
  type = "influxdb_logs"
  namespace = "service"
  endpoint = "https://us-west-2-1.aws.cloud1.influxdata.com"
  database = "vector-database"
  consistency = "one"
  retention_policy_name = "one_day_only"
  username = "vector-source"
  password = "${INFLUXDB_PASSWORD_ENV_VAR}"

InfluxDB v2

[sinks.my_sink_id]
  type = "influxdb_logs"
  namespace = "service"
  endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
  org = "my-org"
  bucket = "my-bucket"
  token = "${INFLUXDB_TOKEN_ENV_VAR}"

InfluxDB Metrics Sink

official docs

The Vector influxdb_metrics sink batches metric events to InfluxDB using v1 or v2 HTTP API.

Vector Metric Types

InfluxDB uses line protocol to write data points. It is a text-based format that provides the measurement, tag set, field set, and timestamp of a data point.

The following matrix outlines how Vector metric types are mapped into InfluxDB Line Protocol fields.

Vector Metrics Line Protocol Fields Example
Counter value ns.total,metric_type=counter value=1.5 1542182950000000011
Gauge value ns.meter,metric_type=gauge,normal_tag=value,true_tag=true value=-1.5 1542182950000000011
Set value ns.users,metric_type=set,normal_tag=value,true_tag=true value=2 154218295000000001
Histogram buckets, count, sum ns.requests,metric_type=histogram,normal_tag=value,true_tag=true bucket_1=1i,bucket_2.1=2i,bucket_3=3i,count=6i,sum=12.5 1542182950000000011
Summary quantiles, count, sum ns.requests_sum,metric_type=summary,normal_tag=value,true_tag=true count=6i,quantile_0.01=1.5,quantile_0.5=2,quantile_0.99=3,sum=12 1542182950000000011
Distribution min, max, median, avg, sum, count, quantile 0.95 ns.sparse_stats,metric_type=distribution avg=3,count=10,max=4,median=3,min=1,quantile_0.95=4,sum=30 1542182950000000011

Configuration example

InfluxDB v1

[sinks.my_sink_id]
  type = "influxdb_metrics"
  namespace = "service"
  endpoint = "https://us-west-2-1.aws.cloud1.influxdata.com"
  database = "vector-database"
  consistency = "one"
  retention_policy_name = "one_day_only"
  username = "vector-source"
  password = "${INFLUXDB_PASSWORD_ENV_VAR}"

InfluxDB v2

[sinks.my_sink_id]
  type = "influxdb_metrics"
  namespace = "service"
  endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
  org = "my-org"
  bucket = "my-bucket"
  token = "${INFLUXDB_TOKEN_ENV_VAR}"

Links

Monitoring Logs with Vector and InfluxDB

Vector is highly reliable data router to take control of your observability data. It's has possibility to collect, transform and route date by declarative way in one tool. Vector is designed to follow high reliability, operator safety and one tool principles. Engineering team uses Vector to tame observability pipelines.

In this article I will describe how to monitor logs from Apache HTTP Server. Our observability pipeline will be use a Syslog Source to ingests data through the Syslog protocol.

Architecture diagram from Vector docs: Data Model

This tutorial assumes that you have account at InfluxDB Cloud free tier.

Dockerized environment

To simply setup our observability pipeline we will create a Dockerized environment. The Apache HTTP Server and Vector will be run as a separate Docker container and communicate through Docker bridge network.

So let's create and start a docker network:

docker network create -d bridge influx_network \
       --subnet 192.168.0.0/24 \
       --gateway 192.168.0.1

Routes Apache log to Syslog

The Docker has multiple logging mechanisms to help get logs from running services to correct destination. We configure our dockerized Apache to use the Syslog logging driver:

docker run \
       --detach \
       --name web \
       --network influx_network \
       --publish 8080:80 \
       --log-driver=syslog\
       --log-opt syslog-address=udp://localhost:5140 \
       httpd

Now we are ready to check connection to your new Apache Web Server instance: http://localhost:8080/.

Vector routing

Vector has capabilities to ingesting a lot of types of sources into pipeline - file, journald, kafka... The Sources can both receive and pull in data. We want to use a source that receive data over the network via syslog.

The whole Vector pipeline is defined via vector.toml configuration file.

Syslog Source

The Vector syslog source ingests data through the Syslog protocol and outputs log events.

Define Syslog source that listen on port 5140:

[sources.syslog]
  type = "syslog"
  mode = "udp"
  address = "0.0.0.0:5140"

Now it's time to extract useful information from Apache Log into events.

Transform logs

The Apache log looks like:

192.168.0.1 - - [10/Feb/2000:12:00:00 +0900] "GET / HTTP/1.1" 200 777

We use a regex_parser to extract field's value for: host, user, timestamp, method, path, status and bytes_out.

[transforms.regex_parser]
  inputs = ["syslog"]
  type = "regex_parser"
  regex = '^(?P<host>[\w\.]+) - (?P<user>[\w-]+) \[(?P<timestamp>.*)\] "(?P<method>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$'

Next step is calculate useful metrics. For that we need to transform log event into metric event by log_to_metric. Let's calculate sum of outgoing bytes tagged by method and status:

[transforms.log_to_metric]
  inputs = ["regex_parser"]
  type = "log_to_metric" 

[[transforms.log_to_metric.metrics]]
  type = "counter"
  increment_by_value = true
  field = "bytes_out"
  tags = {method = "{{method}}", status = "{{status}}"}

And what is final step? Push data into InfluxDB!

InfluxDB Sinks

Let's configure InfluxDB sinks to push data into InfluxDB 2 Cloud free tier.

Logs

The influxdb_logs batches log events to InfluxDB using v1 or v2 HTTP API.

The regex_parser produces log events with extracted fields. So we will configure the regex_parser as an input to our sink and fields: method, path as tags for outgoing LineProtocol.

[sinks.influxdb_2_logs]
  type = "influxdb_logs"
  inputs = ["regex_parser"]
  namespace = "vector-logs"
  tags = ["method", "path"]
  endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
  org = "My Company"
  bucket = "vector"
  token = "jSc6rmToXkx6y8vOv1ruac4ZCvYNpGtGzHkrJsF84bi0q9olFjpV6h6yv1f5xNs26_cHVURarPIpd6Bklvfe-w=="

Metrics

The influxdb_metrics batches metric events to InfluxDB using v1 or v2 HTTP API.

As an input for influxdb_metrics we will use aggregated metrics by log_to_metric transformer.

[sinks.influxdb_2_metrics]
  type = "influxdb_metrics"
  inputs = ["log_to_metric"]
  namespace = "vector-metrics"
  endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
  org = "My Company"
  bucket = "vector"
  token = "jSc6rmToXkx6y8vOv1ruac4ZCvYNpGtGzHkrJsF84bi0q9olFjpV6h6yv1f5xNs26_cHVURarPIpd6Bklvfe-w=="

Full configuration

#                                    __   __  __
#                                    \ \ / / / /
#                                     \ V / / /
#                                      \_/  \/
#
#                                    V E C T O R
#                                   Configuration
#
# ------------------------------------------------------------------------------
# Website: https://vector.dev
# Docs: https://vector.dev/docs/
# ------------------------------------------------------------------------------
  
#
# Incoming Syslog source
#
[sources.syslog]
  type = "syslog"
  mode = "udp"
  address = "0.0.0.0:5140"

#
# Transform logs into metrics
#
[transforms.regex_parser]
  inputs = ["syslog"]
  type = "regex_parser"
  patterns = ['^(?P<host>[\w\.]+) - (?P<user>[\w-]+) \[(?P<timestamp>.*)\] "(?P<method>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$']

[transforms.log_to_metric]
  inputs = ["regex_parser"]
  type = "log_to_metric"

[[transforms.log_to_metric.metrics]]
  type = "counter"
  increment_by_value = true
  field = "bytes_out"
  tags = {method = "{{method}}", status = "{{status}}"}    

#
# Output Logs into InfluxDB 2
#
[sinks.influxdb_2_logs]
  type = "influxdb_logs"
  inputs = ["regex_parser"]
  namespace = "vector-logs"
  tags = ["appname", "method", "path"]
  endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
  org = "My Company"
  bucket = "vector"
  token = "jSc6rmToXkx6y8vOv1ruac4ZCvYNpGtGzHkrJsF84bi0q9olFjpV6h6yv1f5xNs26_cHVURarPIpd6Bklvfe-w=="

#
# Output Metrics into InfluxDB 2
#
[sinks.influxdb_2_metrics]
  type = "influxdb_metrics"
  inputs = ["log_to_metric"]
  namespace = "vector-metrics"
  endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
  org = "My Company"
  bucket = "vector"
  token = "jSc6rmToXkx6y8vOv1ruac4ZCvYNpGtGzHkrJsF84bi0q9olFjpV6h6yv1f5xNs26_cHVURarPIpd6Bklvfe-w=="

Bring Vector's pipeline to live

The Vector supports wide range of platform. You are able to run Vector on Windows, Linux, MacOS, ARMs and also provides packages for popular managers as DPKG, Homebrew, RPM and so on. We will use Docker image available on Docker Hub.

Run the Vector Docker image with our vector.tml:

docker run \
       --name vector \
       --network influx_network \
       --publish 5140:5140/udp \
       --volume "${PWD}"/vector.toml:/etc/vector/vector.toml:ro \
       timberio/vector:nightly-2020-06-02-alpine

Visualize metrics in InfluxDB

Now it’s time to create some charts. To do this, log in into InfluxDB and create Dashboard cells with flowing Flux queries:

from(bucket: "vector")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "vector-metrics.bytes_out")
  |> filter(fn: (r) => r._field == "value")

from(bucket: "vector")
  |> range(start: 0)
  |> filter(fn: (r) => r._measurement == "vector-metrics.bytes_out")
  |> filter(fn: (r) => r._field == "value") 
  |> toInt()
  |> sum(column: "_value")
  |> map(fn: (r) => ({ r with _value: r._value / 1024 }))

from(bucket: "vector")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "vector-logs.vector")
  |> filter(fn: (r) => r["_field"] == "status")
  |> drop(columns: ["appname", "metric_type", "source_type", "_field", "_measurement", "host"])

from(bucket: "vector")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "vector-logs.vector")
  |> filter(fn: (r) => r._field == "status")
  |> drop(columns: ["host", "appname", "path"])
  |> count()

and result should looks like:

Conclusion

The Vector together with InfluxDB is powerful tool set to handle observability pipelines. In this example we just cover only small piece of Vector possibilities. What about scripting in your pipelines? Yes - just use lua transformer. Geolocation of your logs? Yes - just use geoip transformer...

As always, if you run into hurdles, please share them on our community site or Slack channel. We’d love to get your feedback and help you with any problems you run into.

The script that run everything together can be found here and exported InfluxDB template here.

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages