Allow transforming and aggregating statistics on events in tap
via VRL.
#11806
Labels
domain: administration
Anything related to administration/operation
domain: cli
Anything related to Vector's CLI
domain: observability
Anything related to monitoring/observing Vector
domain: vrl
Anything related to the Vector Remap Language
needs: rfc
Needs an RFC before work can begin.
type: enhancement
A value-adding code change that enhances its existing functionality.
Context
Currently, users have the ability to run
vector tap
to examine the data out of components for the purposes of inspecting events to understand their payload, or ensure that transformations are being applied correctly, and so on.In many cases, inspecting the data is a means to an end: a user is trying to make sure the right data is coming through, whether that means checking that a certain field is present, or figuring out which values of a given tag are present. In many cases, the thing a user cares about when using
vector tap
is only a very small part of the overall event. It would be beneficial to allow extracting only that component to increase the signal-to-noise ratio of the console output.Additionally, there can be times when the user wants to sample the data stream in order to ensure data quality. This could take the form of getting a count of all unique values for a field/tag, or a histogram of the values measured by a metric, and so on.
An interesting enhancement for
vector tap
would be to allow applying transformations and aggregations on-the-fly, using VRL.Proposal
We should allow specifying VRL when attaching a tap such that it can transform or aggregate the events before they're displayed the user. Two flavors/modes would be provided: transformation and aggregation.
Transformation
As mentioned above, sometimes a user only cares about a specific value within an event, and so scoping the output to only show that field can significantly increase the signal-to-noise ratio. In this mode, the user would supply a VRL program -- perhaps by snippet or file path -- that would look just like a normal VRL program that had access to the full event, but they would only emit the specific field they cared about, or perhaps it may be some sort of newly-constructed object that grabs multiple relevant fields, and so on.
Aggregation
Additionally, for situations where the user is looking to measure a statistical property of the event stream, they would use the aggregation mode. This mode would be somewhat of an opinionated auto-configured aggregator. The basic premise is that the user would specify an aggregation program -- again, using VRL -- that described how to measure the desired property of events within the event stream.
The configuration would essentially express the visualization to generate, and the way to get the data to calculate the value to measure from an event. For example, maybe a user has log events where the application/service is passed in via a tag. They could generate a top-K table of logs by application/service tag value as a way to quantify the portion of event load that each application/service is responsible for. In another example, maybe a user cares about the examining the distribution of a value in a common field, such as response latency. They could configure the tap to emit a rolling histogram of response latencies after extracting them from a log or metric.
If no specific aggregation program was given,
vector tap
would switch to a default mode that provided a default/canned set of properties of the event stream: histogram of event size, number of metrics vs logs vs traces, and so on.The text was updated successfully, but these errors were encountered: