Description
The current NPD architecture in the logMonitor is that the pluginConfig.message
regex is used to capture a string which is included in the node condition or Event if a fault is detected. This feature is very useful as it allows for node conditions and Events to include specific, actionable information from the log event that indicated a problem, avoiding the need for an administrator or automation system to have to dig through log messages to find out what exactly went wrong.
However, as demonstrated in the sample configuration https://github.com/kubernetes/node-problem-detector/blob/master/config/disk-log-message-filelog.json, because pluginConfig.message
must match ALL the possible conditions that may be matched by rules[].pattern
, the regex can quickly grow long and complex, and ultimately ends up duplicating the work of the rules[].pattern
regexes.
It would be much more convenient if pluginConfig.message
regex was only used for initial filtering, and not for message capture. Instead, have the message extracted from the rules[].pattern
regex. That way, the regexes for each rule serve a more targeted purpose, with the data extracted for that specific detected condition/event configured right there in the rule. Then pluginConfig.message
becomes a higher level filter, which could, at its simplest, be empty, meaning "send all log events through for rules[]
evaluation". But it would also be useful then to leverage pluginConfig.message
for additional purposes:
- Acting as an initial, broad filter that ensures only a subset of log events make it through to be evaluated by
rules[].pattern
. For a high-volume log stream this might be important for resource optimization by avoiding having to evaluate every log event against multiplerules[].pattern
regexes, if it doesn't match the initialpluginConfig.message
regex. - Allowing for simpler
rules[].pattern
regexes, because they only have to be written to match against messages that have already passed through thepluginConfig.message
regex. This can make problem detection more reliable by more carefully controlling the shape and structure of input messages that are passed through to rule patterns.
I propose that the logMonitor be updated in the following ways:
- Preserve
pluginConfig.message
with its current behavior, to keep backward compatibility. If defined in a logMonitor JSON config, thenpluginConfig.prefilter
is ignored, and the current behavior of extracting the message from the top-level filter is preserved. - Add a new
pluginConfig.prefilter
regex whose only purpose is to prefilter the log stream before it is evaluated byrules[].pattern
. If this is defined (or neitherpluginConfig.message
norpluginConfig.prefilter
are defined) then a node condition or event message is extracted from the matchingrules[].pattern
regex, notpluginConfig.message
.