This plugin implements the Zipkin http server to gather trace and timing data needed to troubleshoot latency problems in microservice architectures.
Please Note: This plugin is experimental; Its data schema may be subject to change based on its main usage cases and the evolution of the OpenTracing standard.
This plugin is a service input. Normal plugins gather metrics determined by the interval setting. Service plugins start a service to listens and waits for metrics or events to occur. Service plugins have two key differences from normal plugins:
- The global or plugin specific
interval
setting may not apply - The CLI options of
--test
,--test-wait
, and--once
may not produce output for this plugin
In addition to the plugin-specific configuration settings, plugins support additional global and plugin configuration settings. These settings are used to modify metrics, tags, and field or create aliases and configure ordering, etc. See the CONFIGURATION.md for more details.
# This plugin implements the Zipkin http server to gather trace and timing data needed to troubleshoot latency problems in microservice architectures.
[[inputs.zipkin]]
## URL path for span data
# path = "/api/v1/spans"
## Port on which Telegraf listens
# port = 9411
## Maximum duration before timing out read of the request
# read_timeout = "10s"
## Maximum duration before timing out write of the response
# write_timeout = "10s"
The plugin accepts spans in JSON
or thrift
if the Content-Type
is
application/json
or application/x-thrift
, respectively. If Content-Type
is not set, then the plugin assumes it is JSON
format.
This plugin uses Annotations tags and fields to track data from spans
-
TRACE: is a set of spans that share a single root span. Traces are built by collecting all Spans that share a traceId.
-
SPAN: is a set of Annotations and BinaryAnnotations that correspond to a particular RPC.
-
Annotations: for each annotation & binary annotation of a span a metric is output. Records an occurrence in time at the beginning and end of a request.
Annotations may have the following values:
- CS (client start): beginning of span, request is made.
- SR (server receive): server receives request and will start processing it network latency & clock jitters differ it from cs
- SS (server send): server is done processing and sends request back to client amount of time it took to process request will differ it from sr
- CR (client receive): end of span, client receives response from server RPC is considered complete with this annotation
- "duration_ns": The time in nanoseconds between the end and beginning of a span.
- "id": The 64-bit ID of the span.
- "parent_id": An ID associated with a particular child span. If there is no child span, the parent ID is set to ID.
- "trace_id": The 64 or 128-bit ID of a particular trace. Every span in a trace shares this ID. Concatenation of high and low and converted to hexadecimal.
- "name": Defines a span
- "service_name": Defines a service
- "annotation": The value of an annotation
- "endpoint_host": Listening port concat with IPV4, if port is not present it will not be concatenated
- "service_name": Defines a service
- "annotation": The value of an annotation
- "endpoint_host": Listening port concat with IPV4, if port is not present it will not be concatenated
- "annotation_key": label describing the annotation
Get All Span Names for Service my_web_server
SHOW TAG VALUES FROM "zipkin" with key="name" WHERE "service_name" = 'my_web_server'
- Description: returns a list containing the names of the spans which have annotations with the given
service_name
ofmy_web_server
.
-Get All Service Names-
SHOW TAG VALUES FROM "zipkin" WITH KEY = "service_name"
- Description: returns a list of all
distinct
endpoint service names.
-Find spans with the longest duration-
SELECT max("duration_ns") FROM "zipkin" WHERE "service_name" = 'my_service' AND "name" = 'my_span_name' AND time > now() - 20m GROUP BY "trace_id",time(30s) LIMIT 5
- Description: In the last 20 minutes find the top 5 longest span durations for service
my_server
and span namemy_span_name
This test will create high cardinality data so we recommend using the tsi influxDB engine.
-
Update InfluxDB to >= 1.3, in order to use the new tsi engine.
-
Generate a config file with the following command:
influxd config > /path/for/config/file
-
Add the following to your config file, under the
[data]
tab:[data] index-version = "tsi1"
-
Start
influxd
with your new config file:influxd -config=/path/to/your/config/file
-
Update your retention policy:
ALTER RETENTION POLICY "autogen" ON "telegraf" DURATION 1d SHARD DURATION 30m
{
"traceId": "bd7a977555f6b982",
"name": "query",
"id": "be2d01e33cc78d97",
"parentId": "ebf33e1a81dc6f71",
"timestamp": 1458702548786000,
"duration": 13000,
"annotations": [
{
"endpoint": {
"serviceName": "zipkin-query",
"ipv4": "192.168.1.2",
"port": 9411
},
"timestamp": 1458702548786000,
"value": "cs"
},
{
"endpoint": {
"serviceName": "zipkin-query",
"ipv4": "192.168.1.2",
"port": 9411
},
"timestamp": 1458702548799000,
"value": "cr"
}
],
"binaryAnnotations": [
{
"key": "jdbc.query",
"value": "select distinct `zipkin_spans`.`trace_id` from `zipkin_spans` join `zipkin_annotations` on (`zipkin_spans`.`trace_id` = `zipkin_annotations`.`trace_id` and `zipkin_spans`.`id` = `zipkin_annotations`.`span_id`) where (`zipkin_annotations`.`endpoint_service_name` = ? and `zipkin_spans`.`start_ts` between ? and ?) order by `zipkin_spans`.`start_ts` desc limit ?",
"endpoint": {
"serviceName": "zipkin-query",
"ipv4": "192.168.1.2",
"port": 9411
}
},
{
"key": "sa",
"value": true,
"endpoint": {
"serviceName": "spanstore-jdbc",
"ipv4": "127.0.0.1",
"port": 3306
}
}
]
}