InfluxDB is open source time series database, purpose-built by InfluxData for monitoring metrics and events, provides real-time visibility into stacks, sensors, and systems. Use InfluxDB to capture, analyze, and store millions of points per second and much more.
Vector is a highly reliable observability data router built for demanding production environments. On top of this basic functionality, Vector adds a few important enhancements:
- A richer data model, supporting not only logs but aggregated metrics, fully structured events, etc
- Programmable transforms written in lua (or eventually wasm) that let you parse, filter, aggregate, and otherwise manipulate your data in arbitrary ways
- Uncompromising performance and efficiency that enables a huge variety of deployment strategies
The individual pieces of data flowing through Vector are known as events. Events are arbitrarily wide, and deep, structured pieces of data. There are two types of events: log
and metric
.
A log
event is a structured represention of a point-in-time event. It contains an arbitrary set of fields (key/value pairs) that describe the event.
{
"host": "my.host.com",
"message": "<13>Feb 13 20:07:26 74794bfb6795 root[8539]: i am foobar",
"timestamp": "2019-11-01T21:15:47+00:00"
}
A metric
event represents a numerical operation to a time series. Operations offered are heavily inspired by the StatsD and Prometheus models, and determine the schema of the metric structure within Vector.
{
"name": "login.count",
"timestamp": "2019-11-01T21:15:47+00:00",
"kind": "absolute",
"tags": {
"host": "my.host.com"
},
"counter": {
"value": 24.2
}
}
Existing services usually emit metrics, traces, and logs of varying quality. By designing Vector to meet services where they are (current state), Vector serve as a bridge to newer standards. This is why Vector place "events" at the top of data model, where logs and metrics are derived (traces coming soon).
The Vector influxdb_logs
sink
batches log
events to
InfluxDB using v1 or
v2 HTTP API.
InfluxDB uses line protocol to write data points. It is a text-based format that provides the measurement, tag set, field set, and timestamp of a data point.
A Log Event
event contains an arbitrary set of fields (key/value pairs) that describe the event.
The following matrix outlines how Log Event fields are mapped into InfluxDB Line Protocol:
Field | Line Protocol |
---|---|
host | tag |
message | field |
source_type | tag |
timestamp | timestamp |
[custom-key] | field |
The default behaviour could be overridden by a tags
configuration.
The following example shows how is Log Event
mapped into Line Protocol
:
{
"host": "my.host.com",
"message": "<13>Feb 13 20:07:26 74794bfb6795 root[8539]: i am foobar",
"timestamp": "2019-11-01T21:15:47+00:00",
"custom_field": "custom_value"
}
ns.vector,host=my.host.com,metric_type=logs custom_field="custom_value",message="<13>Feb 13 20:07:26 74794bfb6795 root[8539]: i am foobar" 1572642947000000000
[sinks.my_sink_id]
type = "influxdb_logs"
namespace = "service"
endpoint = "https://us-west-2-1.aws.cloud1.influxdata.com"
database = "vector-database"
consistency = "one"
retention_policy_name = "one_day_only"
username = "vector-source"
password = "${INFLUXDB_PASSWORD_ENV_VAR}"
[sinks.my_sink_id]
type = "influxdb_logs"
namespace = "service"
endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
org = "my-org"
bucket = "my-bucket"
token = "${INFLUXDB_TOKEN_ENV_VAR}"
The Vector influxdb_metrics
sink batches metric events to InfluxDB using v1 or v2 HTTP API.
InfluxDB uses line protocol to write data points. It is a text-based format that provides the measurement, tag set, field set, and timestamp of a data point.
The following matrix outlines how Vector metric types are mapped into InfluxDB Line Protocol fields.
Vector Metrics | Line Protocol Fields | Example |
---|---|---|
Counter | value | ns.total,metric_type=counter value=1.5 1542182950000000011 |
Gauge | value | ns.meter,metric_type=gauge,normal_tag=value,true_tag=true value=-1.5 1542182950000000011 |
Set | value | ns.users,metric_type=set,normal_tag=value,true_tag=true value=2 154218295000000001 |
Histogram | buckets, count, sum | ns.requests,metric_type=histogram,normal_tag=value,true_tag=true bucket_1=1i,bucket_2.1=2i,bucket_3=3i,count=6i,sum=12.5 1542182950000000011 |
Summary | quantiles, count, sum | ns.requests_sum,metric_type=summary,normal_tag=value,true_tag=true count=6i,quantile_0.01=1.5,quantile_0.5=2,quantile_0.99=3,sum=12 1542182950000000011 |
Distribution | min, max, median, avg, sum, count, quantile 0.95 | ns.sparse_stats,metric_type=distribution avg=3,count=10,max=4,median=3,min=1,quantile_0.95=4,sum=30 1542182950000000011 |
[sinks.my_sink_id]
type = "influxdb_metrics"
namespace = "service"
endpoint = "https://us-west-2-1.aws.cloud1.influxdata.com"
database = "vector-database"
consistency = "one"
retention_policy_name = "one_day_only"
username = "vector-source"
password = "${INFLUXDB_PASSWORD_ENV_VAR}"
[sinks.my_sink_id]
type = "influxdb_metrics"
namespace = "service"
endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
org = "my-org"
bucket = "my-bucket"
token = "${INFLUXDB_TOKEN_ENV_VAR}"
- What is Vector?
- Vector PRs:
- InfluxDB
Vector is highly reliable data router to take control of your observability data.
It's has possibility to collect, transform and route date by declarative way in one tool.
Vector is designed to follow high reliability
, operator safety
and one tool
principles.
Engineering team uses Vector to tame observability pipelines.
In this article I will describe how to monitor logs from Apache HTTP Server. Our observability pipeline will be use a Syslog Source to ingests data through the Syslog protocol.
Architecture diagram from Vector docs: Data Model
This tutorial assumes that you have account at InfluxDB Cloud free tier.
To simply setup our observability pipeline we will create a Dockerized environment. The Apache HTTP Server and Vector will be run as a separate Docker container and communicate through Docker bridge network.
So let's create and start a docker network:
docker network create -d bridge influx_network \
--subnet 192.168.0.0/24 \
--gateway 192.168.0.1
The Docker has multiple logging mechanisms to help get logs from running services to correct destination. We configure our dockerized Apache to use the Syslog logging driver:
docker run \
--detach \
--name web \
--network influx_network \
--publish 8080:80 \
--log-driver=syslog\
--log-opt syslog-address=udp://localhost:5140 \
httpd
Now we are ready to check connection to your new Apache Web Server instance: http://localhost:8080/.
Vector has capabilities to ingesting a lot of types of sources into pipeline - file, journald, kafka...
The Sources can both receive and pull in data. We want to use a source that receive data over the network via syslog
.
The whole Vector pipeline is defined via vector.toml
configuration file.
The Vector syslog
source ingests data through the Syslog protocol and outputs log events.
Define Syslog source that listen on port 5140:
[sources.syslog]
type = "syslog"
mode = "udp"
address = "0.0.0.0:5140"
Now it's time to extract useful information from Apache Log into events.
The Apache log looks like:
192.168.0.1 - - [10/Feb/2000:12:00:00 +0900] "GET / HTTP/1.1" 200 777
We use a regex_parser to extract field's value for: host
, user
, timestamp
, method
, path
, status
and bytes_out
.
[transforms.regex_parser]
inputs = ["syslog"]
type = "regex_parser"
regex = '^(?P<host>[\w\.]+) - (?P<user>[\w-]+) \[(?P<timestamp>.*)\] "(?P<method>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$'
Next step is calculate useful metrics. For that we need to transform log event
into metric event
by log_to_metric. Let's calculate sum of outgoing bytes tagged by method
and status
:
[transforms.log_to_metric]
inputs = ["regex_parser"]
type = "log_to_metric"
[[transforms.log_to_metric.metrics]]
type = "counter"
increment_by_value = true
field = "bytes_out"
tags = {method = "{{method}}", status = "{{status}}"}
And what is final step? Push data into InfluxDB!
Let's configure InfluxDB sinks to push data into InfluxDB 2 Cloud free tier.
The influxdb_logs batches log events to InfluxDB using v1 or v2 HTTP API.
The regex_parser
produces log events with extracted fields. So we will configure the regex_parser
as an input to our sink and fields: method
, path
as tags for outgoing LineProtocol.
[sinks.influxdb_2_logs]
type = "influxdb_logs"
inputs = ["regex_parser"]
namespace = "vector-logs"
tags = ["method", "path"]
endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
org = "My Company"
bucket = "vector"
token = "jSc6rmToXkx6y8vOv1ruac4ZCvYNpGtGzHkrJsF84bi0q9olFjpV6h6yv1f5xNs26_cHVURarPIpd6Bklvfe-w=="
The influxdb_metrics batches metric events to InfluxDB using v1 or v2 HTTP API.
As an input for influxdb_metrics
we will use aggregated metrics by log_to_metric
transformer.
[sinks.influxdb_2_metrics]
type = "influxdb_metrics"
inputs = ["log_to_metric"]
namespace = "vector-metrics"
endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
org = "My Company"
bucket = "vector"
token = "jSc6rmToXkx6y8vOv1ruac4ZCvYNpGtGzHkrJsF84bi0q9olFjpV6h6yv1f5xNs26_cHVURarPIpd6Bklvfe-w=="
# __ __ __
# \ \ / / / /
# \ V / / /
# \_/ \/
#
# V E C T O R
# Configuration
#
# ------------------------------------------------------------------------------
# Website: https://vector.dev
# Docs: https://vector.dev/docs/
# ------------------------------------------------------------------------------
#
# Incoming Syslog source
#
[sources.syslog]
type = "syslog"
mode = "udp"
address = "0.0.0.0:5140"
#
# Transform logs into metrics
#
[transforms.regex_parser]
inputs = ["syslog"]
type = "regex_parser"
patterns = ['^(?P<host>[\w\.]+) - (?P<user>[\w-]+) \[(?P<timestamp>.*)\] "(?P<method>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$']
[transforms.log_to_metric]
inputs = ["regex_parser"]
type = "log_to_metric"
[[transforms.log_to_metric.metrics]]
type = "counter"
increment_by_value = true
field = "bytes_out"
tags = {method = "{{method}}", status = "{{status}}"}
#
# Output Logs into InfluxDB 2
#
[sinks.influxdb_2_logs]
type = "influxdb_logs"
inputs = ["regex_parser"]
namespace = "vector-logs"
tags = ["appname", "method", "path"]
endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
org = "My Company"
bucket = "vector"
token = "jSc6rmToXkx6y8vOv1ruac4ZCvYNpGtGzHkrJsF84bi0q9olFjpV6h6yv1f5xNs26_cHVURarPIpd6Bklvfe-w=="
#
# Output Metrics into InfluxDB 2
#
[sinks.influxdb_2_metrics]
type = "influxdb_metrics"
inputs = ["log_to_metric"]
namespace = "vector-metrics"
endpoint = "https://us-west-2-1.aws.cloud2.influxdata.com"
org = "My Company"
bucket = "vector"
token = "jSc6rmToXkx6y8vOv1ruac4ZCvYNpGtGzHkrJsF84bi0q9olFjpV6h6yv1f5xNs26_cHVURarPIpd6Bklvfe-w=="
The Vector supports wide range of platform. You are able to run Vector on Windows, Linux, MacOS, ARMs and also provides packages for popular managers as DPKG, Homebrew, RPM and so on. We will use Docker image available on Docker Hub.
Run the Vector Docker image with our vector.tml
:
docker run \
--name vector \
--network influx_network \
--publish 5140:5140/udp \
--volume "${PWD}"/vector.toml:/etc/vector/vector.toml:ro \
timberio/vector:nightly-2020-06-02-alpine
Now it’s time to create some charts. To do this, log in into InfluxDB and create Dashboard cells with flowing Flux queries:
from(bucket: "vector")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "vector-metrics.bytes_out")
|> filter(fn: (r) => r._field == "value")
from(bucket: "vector")
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "vector-metrics.bytes_out")
|> filter(fn: (r) => r._field == "value")
|> toInt()
|> sum(column: "_value")
|> map(fn: (r) => ({ r with _value: r._value / 1024 }))
from(bucket: "vector")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "vector-logs.vector")
|> filter(fn: (r) => r["_field"] == "status")
|> drop(columns: ["appname", "metric_type", "source_type", "_field", "_measurement", "host"])
from(bucket: "vector")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "vector-logs.vector")
|> filter(fn: (r) => r._field == "status")
|> drop(columns: ["host", "appname", "path"])
|> count()
and result should looks like:
The Vector together with InfluxDB is powerful tool set to handle observability pipelines. In this example we just cover only small piece of Vector possibilities. What about scripting in your pipelines? Yes - just use lua transformer. Geolocation of your logs? Yes - just use geoip transformer...
As always, if you run into hurdles, please share them on our community site or Slack channel. We’d love to get your feedback and help you with any problems you run into.
The script that run everything together can be found here and exported InfluxDB template here.