Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow transforming and aggregating statistics on events in tap via VRL. #11806

Open
tobz opened this issue Mar 11, 2022 · 3 comments
Open

Allow transforming and aggregating statistics on events in tap via VRL. #11806

tobz opened this issue Mar 11, 2022 · 3 comments
Labels
domain: administration Anything related to administration/operation domain: cli Anything related to Vector's CLI domain: observability Anything related to monitoring/observing Vector domain: vrl Anything related to the Vector Remap Language needs: rfc Needs an RFC before work can begin. type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@tobz
Copy link
Contributor

tobz commented Mar 11, 2022

Context

Currently, users have the ability to run vector tap to examine the data out of components for the purposes of inspecting events to understand their payload, or ensure that transformations are being applied correctly, and so on.

In many cases, inspecting the data is a means to an end: a user is trying to make sure the right data is coming through, whether that means checking that a certain field is present, or figuring out which values of a given tag are present. In many cases, the thing a user cares about when using vector tap is only a very small part of the overall event. It would be beneficial to allow extracting only that component to increase the signal-to-noise ratio of the console output.

Additionally, there can be times when the user wants to sample the data stream in order to ensure data quality. This could take the form of getting a count of all unique values for a field/tag, or a histogram of the values measured by a metric, and so on.

An interesting enhancement for vector tap would be to allow applying transformations and aggregations on-the-fly, using VRL.

Proposal

We should allow specifying VRL when attaching a tap such that it can transform or aggregate the events before they're displayed the user. Two flavors/modes would be provided: transformation and aggregation.

Transformation

As mentioned above, sometimes a user only cares about a specific value within an event, and so scoping the output to only show that field can significantly increase the signal-to-noise ratio. In this mode, the user would supply a VRL program -- perhaps by snippet or file path -- that would look just like a normal VRL program that had access to the full event, but they would only emit the specific field they cared about, or perhaps it may be some sort of newly-constructed object that grabs multiple relevant fields, and so on.

Aggregation

Additionally, for situations where the user is looking to measure a statistical property of the event stream, they would use the aggregation mode. This mode would be somewhat of an opinionated auto-configured aggregator. The basic premise is that the user would specify an aggregation program -- again, using VRL -- that described how to measure the desired property of events within the event stream.

The configuration would essentially express the visualization to generate, and the way to get the data to calculate the value to measure from an event. For example, maybe a user has log events where the application/service is passed in via a tag. They could generate a top-K table of logs by application/service tag value as a way to quantify the portion of event load that each application/service is responsible for. In another example, maybe a user cares about the examining the distribution of a value in a common field, such as response latency. They could configure the tap to emit a rolling histogram of response latencies after extracting them from a log or metric.

If no specific aggregation program was given, vector tap would switch to a default mode that provided a default/canned set of properties of the event stream: histogram of event size, number of metrics vs logs vs traces, and so on.

@tobz tobz added type: enhancement A value-adding code change that enhances its existing functionality. domain: cli Anything related to Vector's CLI domain: observability Anything related to monitoring/observing Vector needs: rfc Needs an RFC before work can begin. domain: administration Anything related to administration/operation domain: vrl Anything related to the Vector Remap Language labels Mar 11, 2022
@tobz tobz changed the title Allow transforming and aggregating statistics events in tap via VRL. Allow transforming and aggregating statistics on events in tap via VRL. Mar 11, 2022
@spencergilbert
Copy link
Contributor

I'd note that this would also be nice from a config writing perspective, as you could test your remap against real events flowing through your observability pipeline

@tobz
Copy link
Contributor Author

tobz commented Mar 11, 2022

I'd note that this would also be nice from a config writing perspective, as you could test your remap against real events flowing through your observability pipeline

Yeah, definitely. Blackhole sink, just set up all the sources you want to ingest, and then you could have a fast turnaround dev cycle of: edit the VRL, run vector tap ..., see how it looks, and rinse and repeat.

@hhromic
Copy link
Contributor

hhromic commented Mar 11, 2022

This is a nice idea to get built-in. Wanted to chime-in to note that currently, to achieve a similar functionality, I simply pipe the output of vector tap through jq or yq (if using -f yaml). Both of these tools provide very powerful transformations as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: administration Anything related to administration/operation domain: cli Anything related to Vector's CLI domain: observability Anything related to monitoring/observing Vector domain: vrl Anything related to the Vector Remap Language needs: rfc Needs an RFC before work can begin. type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

3 participants