-
Notifications
You must be signed in to change notification settings - Fork 2.4k
/
README.md
343 lines (284 loc) · 11.6 KB
/
README.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
# ClickHouse Exporter
<!-- status autogenerated section -->
| Status | |
| ------------- |-----------|
| Stability | [alpha]: traces, metrics, logs |
| Distributions | [contrib] |
| Issues | [![Open issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aopen%20label%3Aexporter%2Fclickhouse%20&label=open&color=orange&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aopen+is%3Aissue+label%3Aexporter%2Fclickhouse) [![Closed issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aclosed%20label%3Aexporter%2Fclickhouse%20&label=closed&color=blue&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aclosed+is%3Aissue+label%3Aexporter%2Fclickhouse) |
| [Code Owners](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#becoming-a-code-owner) | [@hanjm](https://www.github.com/hanjm), [@dmitryax](https://www.github.com/dmitryax), [@Frapschen](https://www.github.com/Frapschen) |
[alpha]: https://github.com/open-telemetry/opentelemetry-collector#alpha
[contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib
<!-- end autogenerated section -->
This exporter supports sending OpenTelemetry data to [ClickHouse](https://clickhouse.com/).
> ClickHouse is an open-source, high performance columnar OLAP database management system for real-time analytics using
> SQL.
> Throughput can be measured in rows per second or megabytes per second.
> If the data is placed in the page cache, a query that is not too complex is processed on modern hardware at a speed of
> approximately 2-10 GB/s of uncompressed data on a single server.
> If 10 bytes of columns are extracted, the speed is expected to be around 100-200 million rows per second.
Note:
Always
add [batch-processor](https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor) to
collector pipeline,
as [ClickHouse document says:](https://clickhouse.com/docs/en/introduction/performance/#performance-when-inserting-data)
> We recommend inserting data in packets of at least 1000 rows, or no more than a single request per second. When
> inserting to a MergeTree table from a tab-separated dump, the insertion speed can be from 50 to 200 MB/s.
## User Cases
1. Use [Grafana Clickhouse datasource](https://grafana.com/grafana/plugins/grafana-clickhouse-datasource/) or
[vertamedia-clickhouse-datasource](https://grafana.com/grafana/plugins/vertamedia-clickhouse-datasource/) to make
dashboard.
Support time-series graph, table and logs.
2. Analyze logs via powerful clickhouse SQL.
### Logs
- Get log severity count time series.
```clickhouse
SELECT toDateTime(toStartOfInterval(Timestamp, INTERVAL 60 second)) as time, SeverityText, count() as count
FROM otel_logs
WHERE time >= NOW() - INTERVAL 1 HOUR
GROUP BY SeverityText, time
ORDER BY time;
```
- Find any log.
```clickhouse
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE Timestamp >= NOW() - INTERVAL 1 HOUR
Limit 100;
```
- Find log with specific service.
```clickhouse
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE ServiceName = 'clickhouse-exporter'
AND Timestamp >= NOW() - INTERVAL 1 HOUR
Limit 100;
```
- Find log with specific attribute.
```clickhouse
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE LogAttributes['container_name'] = '/example_flog_1'
AND Timestamp >= NOW() - INTERVAL 1 HOUR
Limit 100;
```
- Find log with body contain string token.
```clickhouse
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE hasToken(Body, 'http')
AND Timestamp >= NOW() - INTERVAL 1 HOUR
Limit 100;
```
- Find log with body contain string.
```clickhouse
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE Body like '%http%'
AND Timestamp >= NOW() - INTERVAL 1 HOUR
Limit 100;
```
- Find log with body regexp match string.
```clickhouse
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE match(Body, 'http')
AND Timestamp >= NOW() - INTERVAL 1 HOUR
Limit 100;
```
- Find log with body json extract.
```clickhouse
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE JSONExtractFloat(Body, 'bytes') > 1000
AND Timestamp >= NOW() - INTERVAL 1 HOUR
Limit 100;
```
### Traces
- Find spans with specific attribute.
```clickhouse
SELECT Timestamp as log_time,
TraceId,
SpanId,
ParentSpanId,
SpanName,
SpanKind,
ServiceName,
Duration,
StatusCode,
StatusMessage,
toString(SpanAttributes),
toString(ResourceAttributes),
toString(Events.Name),
toString(Links.TraceId)
FROM otel_traces
WHERE ServiceName = 'clickhouse-exporter'
AND SpanAttributes['peer.service'] = 'tracegen-server'
AND Timestamp >= NOW() - INTERVAL 1 HOUR
Limit 100;
```
- Find traces with traceID (using time primary index and TraceID skip index).
```clickhouse
WITH
'391dae938234560b16bb63f51501cb6f' as trace_id,
(SELECT min(Start) FROM otel_traces_trace_id_ts WHERE TraceId = trace_id) as start,
(SELECT max(End) + 1 FROM otel_traces_trace_id_ts WHERE TraceId = trace_id) as end
SELECT Timestamp as log_time,
TraceId,
SpanId,
ParentSpanId,
SpanName,
SpanKind,
ServiceName,
Duration,
StatusCode,
StatusMessage,
toString(SpanAttributes),
toString(ResourceAttributes),
toString(Events.Name),
toString(Links.TraceId)
FROM otel_traces
WHERE TraceId = trace_id
AND Timestamp >= start
AND Timestamp <= end
Limit 100;
```
- Find spans is error.
```clickhouse
SELECT Timestamp as log_time,
TraceId,
SpanId,
ParentSpanId,
SpanName,
SpanKind,
ServiceName,
Duration,
StatusCode,
StatusMessage,
toString(SpanAttributes),
toString(ResourceAttributes),
toString(Events.Name),
toString(Links.TraceId)
FROM otel_traces
WHERE ServiceName = 'clickhouse-exporter'
AND StatusCode = 'STATUS_CODE_ERROR'
AND Timestamp >= NOW() - INTERVAL 1 HOUR
Limit 100;
```
- Find slow spans.
```clickhouse
SELECT Timestamp as log_time,
TraceId,
SpanId,
ParentSpanId,
SpanName,
SpanKind,
ServiceName,
Duration,
StatusCode,
StatusMessage,
toString(SpanAttributes),
toString(ResourceAttributes),
toString(Events.Name),
toString(Links.TraceId)
FROM otel_traces
WHERE ServiceName = 'clickhouse-exporter'
AND Duration > 1 * 1e9
AND Timestamp >= NOW() - INTERVAL 1 HOUR
Limit 100;
```
### Metrics
Metrics data is stored in different clickhouse tables depending on their types. The tables will have a suffix to
distinguish which type of metrics data is stored.
| Metrics Type | Metrics Table |
| --------------------- | ---------------------- |
| sum | _sum |
| gauge | _gauge |
| histogram | _histogram |
| exponential histogram | _exponential_histogram |
| summary | _summary |
Before you make a metrics query, you need to know the type of metric you wish to use. If your metrics come from
Prometheus(or someone else uses OpenMetrics protocol), you also need to know the
[compatibility](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#prometheus-and-openmetrics-compatibility)
between Prometheus(OpenMetrics) and OTLP Metrics.
- Find a sum metrics with name
```clickhouse
select TimeUnix,MetricName,Attributes,Value from otel_metrics_sum
where MetricName='calls_total' limit 100
```
- Find a sum metrics with name, attribute.
```clickhouse
select TimeUnix,MetricName,Attributes,Value from otel_metrics_sum
where MetricName='calls_total' and Attributes['service_name']='featureflagservice'
limit 100
```
The OTLP Metrics [define two type value for one datapoint](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto#L358),
clickhouse only use one value of float64 to store them.
## Performance Guide
A single ClickHouse instance with 32 CPU cores and 128 GB RAM can handle around 20 TB (20 Billion) logs per day,
the data compression ratio is 7 ~ 11, the compressed data store in disk is 1.8 TB ~ 2.85 TB,
add more clickhouse node to cluster can increase linearly.
The otel-collector with `otlp receiver/batch processor/clickhouse tcp exporter` can process
around 40k/s logs entry per CPU cores, add more collector node can increase linearly.
## Configuration options
The following settings are required:
- `endpoint` (no default): The ClickHouse server address, support multi host with port, for example:
- tcp protocol `tcp://addr1:port,tcp://addr2:port` or TLS `tcp://addr1:port,addr2:port?secure=true`
- http protocol `http://addr1:port,addr2:port` or https `https://addr1:port,addr2:port`
- clickhouse protocol `clickhouse://addr1:port,addr2:port` or TLS `clickhouse://addr1:port,addr2:port?secure=true`
Many other ClickHouse specific options can be configured through query parameters e.g. `addr?dial_timeout=5s&compress=lz4`. For a full list of options see the [ClickHouse driver documentation](https://github.com/ClickHouse/clickhouse-go/blob/b2f9409ba1c7bb239a4f6553a6da347f3f5f1330/clickhouse_options.go#L174)
Connection options:
- `username` (default = ): The authentication username.
- `password` (default = ): The authentication password.
- `ttl_days` (default = 0): The data time-to-live in days, 0 means no ttl.
- `database` (default = otel): The database name.
- `connection_params` (default = {}). Params is the extra connection parameters with map format.
ClickHouse tables:
- `logs_table_name` (default = otel_logs): The table name for logs.
- `traces_table_name` (default = otel_traces): The table name for traces.
- `metrics_table_name` (default = otel_metrics): The table name for metrics.
Processing:
- `timeout` (default = 5s): The timeout for every attempt to send data to the backend.
- `sending_queue`
- `queue_size` (default = 1000): Maximum number of batches kept in memory before dropping data.
- `retry_on_failure`
- `enabled` (default = true)
- `initial_interval` (default = 5s): The Time to wait after the first failure before retrying; ignored if `enabled`
is `false`
- `max_interval` (default = 30s): The upper bound on backoff; ignored if `enabled` is `false`
- `max_elapsed_time` (default = 300s): The maximum amount of time spent trying to send a batch; ignored if `enabled`
is `false`
## TLS
The exporter supports TLS. To enable TLS, you need to specify the `secure=true` query parameter in the `endpoint` URL or
use the `https` scheme.
## Example
This example shows how to configure the exporter to send data to a ClickHouse server.
It uses the native protocol without TLS. The exporter will create the database and tables if they don't exist.
The data is stored for 3 days.
```yaml
receivers:
examplereceiver:
processors:
batch:
timeout: 5s
send_batch_size: 100000
exporters:
clickhouse:
endpoint: tcp://127.0.0.1:9000?dial_timeout=10s&compress=lz4
database: otel
ttl_days: 3
logs_table_name: otel_logs
traces_table_name: otel_traces
metrics_table_name: otel_metrics
timeout: 5s
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
service:
pipelines:
logs:
receivers: [ examplereceiver ]
processors: [ batch ]
exporters: [ clickhouse ]
```