Skip to content

Commit 0f6b123

Browse files
Add detail to tmpnet metrics documentation (#2854)
Signed-off-by: Stephen Buttolph <stephen@avalabs.org> Co-authored-by: Stephen Buttolph <stephen@avalabs.org>
1 parent f57f0f2 commit 0f6b123

File tree

1 file changed

+81
-22
lines changed

1 file changed

+81
-22
lines changed

tests/fixture/tmpnet/README.md

Lines changed: 81 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -234,38 +234,75 @@ The process details of a node are written by avalanchego to
234234
process, the URI of the node's API, and the address other nodes can
235235
use to bootstrap themselves (aka staking address).
236236

237-
## Metrics
238-
239-
### Prometheus configuration
240-
241-
When nodes are started, prometheus configuration for each node is
242-
written to `~/.tmpnet/prometheus/file_sd_configs/` with a filename of
243-
`[network uuid]-[node id].json`. Prometheus can be configured to
244-
scrape the nodes as per the following example:
245-
246-
```yaml
247-
scrape_configs:
248-
- job_name: "avalanchego"
249-
metrics_path: "/ext/metrics"
250-
file_sd_configs:
251-
- files:
252-
- '/home/me/.tmpnet/prometheus/file_sd_configs/*.yaml'
237+
## Monitoring
238+
239+
Monitoring is an essential part of understanding the workings of a
240+
distributed system such as avalanchego. The tmpnet fixture enables
241+
collection of logs and metrics from temporary networks to a monitoring
242+
stack (prometheus+loki+grafana) to enable results to be analyzed and
243+
shared.
244+
245+
### Example usage
246+
247+
```bash
248+
# Start prometheus to collect metrics
249+
PROMETHEUS_ID=<id> PROMETHEUS_PASSWORD=<password> ./scripts/run_prometheus.sh
250+
251+
# Start promtail to collect logs
252+
LOKI_ID=<id> LOKI_PASSWORD=<password> ./scripts/run_promtail.sh
253+
254+
# Network start emits link to grafana displaying collected logs and metrics
255+
./build/tmpnetctl start-network
253256
```
254257

255-
### Viewing metrics
258+
### Metrics collection
259+
260+
When a node is started, configuration enabling collection of metrics
261+
from the node is written to
262+
`~/.tmpnet/prometheus/file_sd_configs/[network uuid]-[node id].json`.
263+
264+
The `scripts/run_prometheus.sh` script starts prometheus in agent mode
265+
configured to scrape metrics from configured nodes and forward the
266+
metrics to a persistent prometheus instance. The script requires that
267+
the `PROMETHEUS_ID` and `PROMETHEUS_PASSWORD` env vars be set. By
268+
default the prometheus instance at
269+
https://prometheus-experimental.avax-dev.network will be targeted and
270+
this can be overridden via the `PROMETHEUS_URL` env var.
271+
272+
### Log collection
256273

257-
When a network is started with `tmpnet`, a grafana link for the
258-
network's metrics will be emitted.
274+
Nodes log are stored at `~/.tmpnet/networks/[network id]/[node
275+
id]/logs` by default, and can optionally be forwarded to loki with
276+
promtail.
259277

260-
The metrics emitted by temporary networks configured with tmpnet will
261-
have the following labels applied:
278+
When a node is started, promtail configuration enabling
279+
collection of logs for the node is written to
280+
`~/.tmpnet/promtail/file_sd_configs/[network
281+
uuid]-[node id].json`.
282+
283+
The `scripts/run_promtail.sh` script starts promtail configured to
284+
collect logs from configured nodes and forward the results to loki. The
285+
script requires that the `LOKI_ID` and `LOKI_PASSWORD` env vars be
286+
set. By default the loki instance at
287+
https://loki-experimental.avax-dev.network will be targeted and this
288+
can be overridden via the `LOKI_URL` env var.
289+
290+
### Labels
291+
292+
The logs and metrics collected for temporary networks will have the
293+
following labels applied:
262294

263295
- `network_uuid`
296+
- uniquely identifies a network across hosts
264297
- `node_id`
265298
- `is_ephemeral_node`
299+
- 'ephemeral' nodes are expected to run for only a fraction of the
300+
life of a network
266301
- `network_owner`
302+
- an arbitrary string that can be used to differentiate results
303+
when a CI job runs more than one network
267304

268-
When a tmpnet network runs as part of github CI, the following
305+
When a network runs as part of a github CI job, the following
269306
additional labels will be applied:
270307

271308
- `gh_repo`
@@ -274,3 +311,25 @@ additional labels will be applied:
274311
- `gh_run_number`
275312
- `gh_run_attempt`
276313
- `gh_job_id`
314+
315+
These labels are sourced from Github Actions' `github` context as per
316+
https://docs.github.com/en/actions/learn-github-actions/contexts#github-context.
317+
318+
### Viewing
319+
320+
#### Local networks
321+
322+
When a network is started with tmpnet, a link to the [default grafana
323+
instance](https://grafana-experimental.avax-dev.network) will be
324+
emitted. The dashboards will only be populated if prometheus and
325+
promtail are running locally (as per previous sections) to collect
326+
metrics and logs.
327+
328+
#### CI
329+
330+
Collection of logs and metrics is enabled for CI jobs that use
331+
tmpnet. Each job will execute a step titled `Notify of metrics
332+
availability` that emits a link to grafana parametized to show results
333+
for the job. Additional links to grafana parametized to show results
334+
for individual network will appear in the logs displaying the start of
335+
those networks.

0 commit comments

Comments
 (0)