@@ -234,38 +234,75 @@ The process details of a node are written by avalanchego to
234
234
process, the URI of the node's API, and the address other nodes can
235
235
use to bootstrap themselves (aka staking address).
236
236
237
- ## Metrics
238
-
239
- ### Prometheus configuration
240
-
241
- When nodes are started, prometheus configuration for each node is
242
- written to ` ~/.tmpnet/prometheus/file_sd_configs/ ` with a filename of
243
- ` [network uuid]-[node id].json ` . Prometheus can be configured to
244
- scrape the nodes as per the following example:
245
-
246
- ``` yaml
247
- scrape_configs :
248
- - job_name : " avalanchego"
249
- metrics_path : " /ext/metrics"
250
- file_sd_configs :
251
- - files :
252
- - ' /home/me/.tmpnet/prometheus/file_sd_configs/*.yaml'
237
+ ## Monitoring
238
+
239
+ Monitoring is an essential part of understanding the workings of a
240
+ distributed system such as avalanchego. The tmpnet fixture enables
241
+ collection of logs and metrics from temporary networks to a monitoring
242
+ stack (prometheus+loki+grafana) to enable results to be analyzed and
243
+ shared.
244
+
245
+ ### Example usage
246
+
247
+ ``` bash
248
+ # Start prometheus to collect metrics
249
+ PROMETHEUS_ID=< id> PROMETHEUS_PASSWORD=< password> ./scripts/run_prometheus.sh
250
+
251
+ # Start promtail to collect logs
252
+ LOKI_ID=< id> LOKI_PASSWORD=< password> ./scripts/run_promtail.sh
253
+
254
+ # Network start emits link to grafana displaying collected logs and metrics
255
+ ./build/tmpnetctl start-network
253
256
```
254
257
255
- ### Viewing metrics
258
+ ### Metrics collection
259
+
260
+ When a node is started, configuration enabling collection of metrics
261
+ from the node is written to
262
+ ` ~/.tmpnet/prometheus/file_sd_configs/[network uuid]-[node id].json ` .
263
+
264
+ The ` scripts/run_prometheus.sh ` script starts prometheus in agent mode
265
+ configured to scrape metrics from configured nodes and forward the
266
+ metrics to a persistent prometheus instance. The script requires that
267
+ the ` PROMETHEUS_ID ` and ` PROMETHEUS_PASSWORD ` env vars be set. By
268
+ default the prometheus instance at
269
+ https://prometheus-experimental.avax-dev.network will be targeted and
270
+ this can be overridden via the ` PROMETHEUS_URL ` env var.
271
+
272
+ ### Log collection
256
273
257
- When a network is started with ` tmpnet`, a grafana link for the
258
- network's metrics will be emitted.
274
+ Nodes log are stored at `~ /.tmpnet/networks/[ network id] /[ node
275
+ id] /logs` by default, and can optionally be forwarded to loki with
276
+ promtail.
259
277
260
- The metrics emitted by temporary networks configured with tmpnet will
261
- have the following labels applied :
278
+ When a node is started, promtail configuration enabling
279
+ collection of logs for the node is written to
280
+ `~ /.tmpnet/promtail/file_sd_configs/[ network
281
+ uuid] -[ node id] .json`.
282
+
283
+ The ` scripts/run_promtail.sh ` script starts promtail configured to
284
+ collect logs from configured nodes and forward the results to loki. The
285
+ script requires that the ` LOKI_ID ` and ` LOKI_PASSWORD ` env vars be
286
+ set. By default the loki instance at
287
+ https://loki-experimental.avax-dev.network will be targeted and this
288
+ can be overridden via the ` LOKI_URL ` env var.
289
+
290
+ ### Labels
291
+
292
+ The logs and metrics collected for temporary networks will have the
293
+ following labels applied:
262
294
263
295
- ` network_uuid `
296
+ - uniquely identifies a network across hosts
264
297
- ` node_id `
265
298
- ` is_ephemeral_node `
299
+ - 'ephemeral' nodes are expected to run for only a fraction of the
300
+ life of a network
266
301
- ` network_owner `
302
+ - an arbitrary string that can be used to differentiate results
303
+ when a CI job runs more than one network
267
304
268
- When a tmpnet network runs as part of github CI, the following
305
+ When a network runs as part of a github CI job , the following
269
306
additional labels will be applied:
270
307
271
308
- ` gh_repo `
@@ -274,3 +311,25 @@ additional labels will be applied:
274
311
- ` gh_run_number `
275
312
- ` gh_run_attempt `
276
313
- ` gh_job_id `
314
+
315
+ These labels are sourced from Github Actions' ` github ` context as per
316
+ https://docs.github.com/en/actions/learn-github-actions/contexts#github-context .
317
+
318
+ ### Viewing
319
+
320
+ #### Local networks
321
+
322
+ When a network is started with tmpnet, a link to the [ default grafana
323
+ instance] ( https://grafana-experimental.avax-dev.network ) will be
324
+ emitted. The dashboards will only be populated if prometheus and
325
+ promtail are running locally (as per previous sections) to collect
326
+ metrics and logs.
327
+
328
+ #### CI
329
+
330
+ Collection of logs and metrics is enabled for CI jobs that use
331
+ tmpnet. Each job will execute a step titled `Notify of metrics
332
+ availability` that emits a link to grafana parametized to show results
333
+ for the job. Additional links to grafana parametized to show results
334
+ for individual network will appear in the logs displaying the start of
335
+ those networks.
0 commit comments