Open
Description
has anyone thought of adding internal app metrics to show if problem daemons are having any issues?
following on from #1003 , i have added a few internal log events from various places inside the kmsg watcher so that i can track how often watchloops are starting / watchers are being revived
simple things like adding
k.logCh <- &logtypes.Log{
Message: "[npd-internal] Entering watch loop",
Timestamp: time.Now(),
}
when we start the watch loop, or
k.logCh <- &logtypes.Log{
Message: "[npd-internal] Reviving kmsg parser",
Timestamp: time.Now(),
}
whenever we revive the kmsg parser from inside the watcher. paired with config like:
{
"plugin": "kmsg",
"pluginConfig": {
"revive": "true"
},
"logPath": "/dev/kmsg",
"lookback": "5m",
"bufferSize": 1000,
"source": "kernel-monitor",
"conditions": [
...
...
...
],
"rules": [
{
"type": "temporary",
"reason": "WatchLoopStarted",
"pattern": "\\[npd-internal\\] Entering watch loop.*"
},
{
"type": "temporary",
"reason": "ParserRevived",
"pattern": "\\[npd-internal\\] Reviving.*parser.*"
},
...
...
...
]
}
we get prometheus metrics when the exporter is enabled (default) that look like:
# HELP problem_counter Number of times a specific type of problem have occurred.
# TYPE problem_counter counter
...
...
...
problem_counter{reason="ParserRevived"} 1
...
...
...
problem_counter{reason="WatchLoopStarted"} 2
...
...
...
Metadata
Metadata
Assignees
Labels
No labels