-
Notifications
You must be signed in to change notification settings - Fork 44
/
Copy pathtelemetry_metrics_statsd.ex
398 lines (286 loc) · 13.7 KB
/
telemetry_metrics_statsd.ex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
defmodule TelemetryMetricsStatsd do
@moduledoc """
`Telemetry.Metrics` reporter for StatsD-compatible metric servers.
To use it, start the reporter with the `start_link/1` function, providing it a list of
`Telemetry.Metrics` metric definitions:
import Telemetry.Metrics
TelemetryMetricsStatsd.start_link(
metrics: [
counter("http.request.count"),
sum("http.request.payload_size"),
last_value("vm.memory.total")
]
)
> Note that in the real project the reporter should be started under a supervisor, e.g. the main
> supervisor of your application.
By default the reporter sends metrics to 127.0.0.1:8125 - both hostname and port number can be
configured using the `:host` and `:port` options.
Note that the reporter doesn't aggregate metrics in-process - it sends metric updates to StatsD
whenever a relevant Telemetry event is emitted.
## Translation between Telemetry.Metrics and StatsD
In this section we walk through how the Telemetry.Metrics metric definitions are mapped to StatsD
metrics and their types at runtime.
Telemetry.Metrics metric names are translated as follows:
* if the metric name was provided as a string, e.g. `"http.request.count"`, it is sent to
StatsD server as-is
* if the metric name was provided as a list of atoms, e.g. `[:http, :request, :count]`, it is
first converted to a string by joiging the segments with dots. In this example, the StatsD
metric name would be `"http.request.count"` as well
Since there are multiple implementations of StatsD and each of them provides slightly different
set of features, other aspects of metric translation are controlled by the formatters.
The formatter can be selected using the `:formatter` option. Currently only two formats are
supported - `:standard` and `:datadog`.
The following table shows how `Telemetry.Metrics` metrics map to StatsD metrics:
| Telemetry.Metrics | StatsD |
|-------------------|--------|
| `last_value` | `gauge`, always set to an absolute value |
| `counter` | `counter`, always increased by 1 |
| `sum` | `gauge`, increased and decreased by the provided value |
| `summary` | `timer` recording individual measurement |
| `histogram` | Reported as histogram if DataDog formatter is used |
### The standard StatsD formatter
The `:standard` formatter is compatible with the
[Etsy implementation](https://github.com/statsd/statsd/blob/master/docs/metric_types.md) of StatsD.
Since this particular implementation doesn't support explicit tags, tag values are appended as
consecutive segments of the metric name. For example, given the definition
counter("db.query.count", tags: [:table, :operation])
and the event
:telemetry.execute([:db, :query], %{}, %{table: "users", operation: "select"})
the StatsD metric name would be `"db.query.count.users.select"`. Note that the tag values are
appended to the base metric name in the order they were declared in the metric definition.
Another important aspect of the standard formatter is that all measurements are converted to
integers, i.e. no floats are ever sent to the StatsD daemon.
Now to the metric types!
#### Counter
Telemetry.Metrics counter is simply represented as a StatsD counter. Each event the metric is
based on increments the counter by 1. To be more concrete, given the metric definition
counter("http.request.count")
and the event
:telemetry.execute([:http, :request], %{duration: 120})
the following line would be send to StatsD
"http.request.count:1|c"
Note that the counter was bumped by 1, regardless of the measurements included in the event
(careful reader will notice that the `:count` measurement we chose for the metric wasn't present
in the map of measurements at all!). Such behaviour conforms to the specification of counter as
defined by `Telemetry.Metrics` package - a counter should be incremented by 1 every time a given
event is dispatched.
#### Last value
Last value metric is represented as a StatsD gauge, whose values are always set to the value
of the measurement from the most recent event. With the following metric definition
last_value("vm.memory.total")
and the event
:telemetry.execute([:vm, :memory], %{total: 1024})
the following metric update would be send to StatsD
"vm.memory.total:1024|g"
#### Sum
Sum metric is also represented as a gauge - the difference is that it always changes relatively
and is never set to an absolute value. Given metric definition below
sum("http.request.payload_size")
and the event
:telemetry.execute([:http, :request], %{payload_size: 1076})
the following line would be send to StatsD
"http.request.count:+1076|g"
When the measurement is negative, the StatsD gauge is decreased accordingly.
#### Summary
The summary is simply represented as a StatsD timer, since it should generate statistics about
gathered measurements. Given the metric definition below
summary("http.request.duration")
and the event
:telemetry.execute([:http, :request], %{duration: 120})
the following line would be send to StatsD
"http.request.duration:120|ms"
#### Distribution
There is no metric in original StatsD implementation equivalent to Telemetry.Metrics distribution.
However, histograms can be enabled for selected timer metrics in the
[StatsD daemon configuration](https://github.com/statsd/statsd/blob/master/docs/metric_types.md#timing).
Because of that, the distribution is also reported as a timer. For example, given the following metric
definition
distribution("http.request.duration", buckets: [0])
and the event
:telemetry.execute([:http, :request], %{duration: 120})
the following line would be send to StatsD
"http.request.duration:120|ms"
Since histograms are configured on the StatsD server side, the `:buckets` option has no effect
when used with this reporter.
### The DataDog formatter
The DataDog formatter is compatible with [DogStatsD](https://docs.datadoghq.com/developers/dogstatsd/),
the DataDog StatsD service bundled with its agent.
#### Tags
The main difference from the standard formatter is that DataDog supports explicit tagging in its
protocol. Using the same example as with the standard formatter, given the following definition
counter("db.query.count", tags: [:table, :operation])
and the event
:telemetry.execute([:db, :query], %{}, %{table: "users", operation: "select"})
the metric update packet sent to StatsD would be `db.query.count:1|c|#table:users,operation:select`.
#### Metric types
The only difference between DataDog and standard StatsD metric types is that DataDog provides
a dedicated histogram metric. That's why Telemetry.Metrics distribution is translated to DataDog
histogram.
Also note that DataDog allows measurements to be floats, that's why no rounding is performed when
formatting the metric.
## Global tags
The library provides an option to specify a set of global tag values, which are available to all
metrics running under the reporter.
For example, if you're running your application in multiple deployment environment (staging, production,
etc.), you might set the environment as a global tag:
TelemetryMetricsStatsd.start_link(
metrics: [
counter("http.request.count", tags: [:env])
],
global_tags: [env: "prod"]
)
Note that if the global tag is to be sent with the metric, the metric needs to have it listed under the
`:tags` option, just like any other tag.
Also, if the same key is configured as a global tag and emitted as a part of event metadata or returned
by the `:tag_values` function, the metadata/`:tag_values` take precedence and override the global tag
value.
## Prefixing metric names
Sometimes it's convenient to prefix all metric names with particular value, to group them by the
name of the service, the host, or something else. You can use `:prefix` option to provide a prefix
which will be prepended to all metrics published by the reporter (regardless of the formatter used).
## Maximum datagram size
Metrics are sent to StatsD over UDP, so it's important that the size of the datagram does not
exceed the Maximum Transmission Unit, or MTU, of the link, so that no data is lost on the way.
By default the reporter will break up the datagrams at 512 bytes, but this is configurable via
the `:mtu` option.
"""
use GenServer
require Logger
alias Telemetry.Metrics
alias TelemetryMetricsStatsd.{EventHandler, UDP}
@type prefix :: String.t() | nil
@type host :: String.t() | :inet.ip_address()
@type option ::
{:port, :inet.port_number()}
| {:host, host()}
| {:metrics, [Metrics.t()]}
| {:mtu, non_neg_integer()}
| {:prefix, prefix()}
| {:formatter, :standard | :datadog}
| {:global_tags, Keyword.t()}
@type options :: [option]
@default_port 8125
@default_mtu 512
@default_formatter :standard
@doc """
Reporter's child spec.
This function allows you to start the reporter under a supervisor like this:
children = [
{TelemetryMetricsStatsd, options}
]
See `start_link/1` for a list of available options.
"""
@spec child_spec(options) :: Supervisor.child_spec()
def child_spec(options) do
%{id: __MODULE__, start: {__MODULE__, :start_link, [options]}}
end
@doc """
Starts a reporter and links it to the calling process.
The available options are:
* `:metrics` - a list of Telemetry.Metrics metric definitions which will be published by the
reporter
* `:host` - hostname or IP address of the StatsD server. Defaults to `{127, 0, 0, 1}`. Keep
in mind Erlang's UDP implementation looks up the hostname each time it sends a packet.
Furthermore, telemetry handlers are blocking. For latency-critical applications, it is best
to use an IP here (or resolve it on startup).
* `:port` - port number of the StatsD server. Defaults to `8125`.
* `:formatter` - determines the format of the metrics sent to the target server. Can be either
`:standard` or `:datadog`. Defaults to `:standard`.
* `:prefix` - a prefix prepended to the name of each metric published by the reporter. Defaults
to `nil`.
* `:mtu` - Maximum Transmission Unit of the link between your application and the StatsD server in
bytes. This value should not be greater than the actual MTU since this could lead to the data loss
when the metrics are published. Defaults to `512`.
* `:global_tags` - Additional default tag values to be sent along with every published metric. These
can be overriden by tags sent via the `:telemetry.execute` call.
You can read more about all the options in the `TelemetryMetricsStatsd` module documentation.
## Example
import Telemetry.Metrics
TelemetryMetricsStatsd.start_link(
metrics: [
counter("http.request.count"),
sum("http.request.payload_size"),
last_value("vm.memory.total")
],
prefix: "my-service"
)
"""
@spec start_link(options) :: GenServer.on_start()
def start_link(options) do
config =
options
|> Enum.into(%{})
|> Map.put_new(:host, {127, 0, 0, 1})
|> Map.update!(:host, fn host ->
if(is_binary(host), do: to_charlist(host), else: host)
end)
|> Map.put_new(:port, @default_port)
|> Map.put_new(:mtu, @default_mtu)
|> Map.put_new(:prefix, nil)
|> Map.put_new(:formatter, @default_formatter)
|> Map.update!(:formatter, &validate_and_translate_formatter/1)
|> Map.put_new(:global_tags, Keyword.new())
GenServer.start_link(__MODULE__, config)
end
@doc false
@spec get_udp(pid()) :: UDP.t()
def get_udp(reporter) do
GenServer.call(reporter, :get_udp)
end
@doc false
@spec udp_error(pid(), UDP.t(), reason :: term) :: :ok
def udp_error(reporter, udp, reason) do
GenServer.cast(reporter, {:udp_error, udp, reason})
end
@impl true
def init(config) do
metrics = Map.fetch!(config, :metrics)
case UDP.open(config.host, config.port) do
{:ok, udp} ->
Process.flag(:trap_exit, true)
handler_ids =
EventHandler.attach(
metrics,
self(),
config.mtu,
config.prefix,
config.formatter,
config.global_tags
)
{:ok, %{udp: udp, handler_ids: handler_ids, host: config.host, port: config.port}}
{:error, reason} ->
{:error, {:udp_open_failed, reason}}
end
end
@impl true
def handle_call(:get_udp, _from, state) do
{:reply, state.udp, state}
end
@impl true
def handle_cast({:udp_error, udp, reason}, %{udp: udp} = state) do
Logger.error("Failed to publish metrics over UDP: #{inspect(reason)}")
case UDP.open(state.host, state.port) do
{:ok, udp} ->
{:noreply, %{state | udp: udp}}
{:error, reason} ->
Logger.error("Failed to reopen UDP socket: #{inspect(reason)}")
{:stop, {:udp_open_failed, reason}, state}
end
end
def handle_cast({:udp_error, _, _}, state) do
{:noreply, state}
end
@impl true
def handle_info({:EXIT, _pid, reason}, state) do
{:stop, reason, state}
end
@impl true
def terminate(_reason, state) do
EventHandler.detach(state.handler_ids)
:ok
end
defp validate_and_translate_formatter(:standard), do: TelemetryMetricsStatsd.Formatter.Standard
defp validate_and_translate_formatter(:datadog), do: TelemetryMetricsStatsd.Formatter.Datadog
defp validate_and_translate_formatter(_),
do: raise(ArgumentError, ":formatter needs to be either :standard or :datadog")
end