Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Pivot aggregator #5203

Closed
prydin opened this issue Dec 29, 2018 · 8 comments
Closed

Feature request: Pivot aggregator #5203

prydin opened this issue Dec 29, 2018 · 8 comments
Labels
discussion Topics for discussion

Comments

@prydin
Copy link
Contributor

prydin commented Dec 29, 2018

Feature Request

Add a pivot aggregator for creating aggregated metrics based on a pivot field.

Opening a feature request kicks off a discussion.

Proposal:

Assume you have a stream of metrics for hosts. Each host is part of a cluster, which is associated with the tag cluster. Furthermore, assume each host has a cpuload metric. I would like to synthesize a metrics for the cpuload per cluster with a max, min, average, etc. aggregation.

Contrary to other aggregators, this one would preserve timestamps, but average based on a pivot element, rather than a time period.

We are planning on writing this plugin, but wanted to file an issue for tracking and commenting purposes.

Example

Input:

host=foo cluster=prod load=1 1475583980000000000
host=bar cluster=prod load=1 1475583980000000000
host=monkey cluster=dev load=1 1475583980000000000
host=giraffe cluster=dev load=3 1475583980000000000

Output:

cluster=prod load_avg=1 1475583980000000000
cluster=dev load_avg=2 1475583980000000000

Notice that the plugin would have more aggregations than average, but these are omitted for brevity.

Current behavior:

No pivot aggregator exists

Desired behavior:

A pivot aggregator exists

Use case: [Why is this important (helps with prioritizing requests)]

There are countless use cases for this. The one that's the most pressing for us right now is the ability to synthesize vSphere cluster metrics from hosts. But since pivoting is a very common statistics operator, I'm confident this plugin would find many other uses.

@prydin
Copy link
Contributor Author

prydin commented Dec 29, 2018

Thinking a little more about it, I'm not sure you always want to preserve the timestamps. Maybe we should make that an option. If you don't preserve the timestamp, the resulting value should be an average (or other operation) across all elements AND timestamps that share the same pivot element value.

We also need some missing value management. You would probably want an option to handle missing values by repeating the latest known value for a couple of cycles. That saves you from having skewed numbers because of occasional dropped samples.

@danielnelson
Copy link
Contributor

Would this be similar to doing a tagexclude, using host in the above example, with the basicstats aggregator? BTW, might be best to test with 1.8 until #5209 is fixed.

@prydin
Copy link
Contributor Author

prydin commented Jan 3, 2019

There was some extra functionality, such as metric name substitution etc. we needed, but the immediate need for this aggregator sort of went away, so I'm closing this for now.

@prydin prydin closed this as completed Jan 3, 2019
@puckpuck
Copy link
Contributor

I actually have a need for something described above. In this instance we are trying to aggregate data from multiple details on a tag value into summaries (sum and avg). I can do this with basicstats using taginclude, but I would need to create a different configuration instance for each value of that tag. Perhaps we can enhance basicstats to do this where we can provide a list of tags to group against when doing the aggregations. When using the feature, the output metrics would only include tags that you requested for grouping.

ie:
group_tags = ["workload", "system"]

Would result in all metrics being grouped by workload and system, and the output metrics would only include those 2 tags.

@danielnelson
Copy link
Contributor

It seems to me that group_tags would work the same as taginclude, I might be missing something so here is an example:

[[aggregators.basicstats]]
  period = "10s"
  drop_original = false
  stats = ["sum"]
  namepass = "foo"
  taginclude = ["workload", "system"]
# 2 lines of input
foo,host=alice,system=b,workload=y value=42 1548283697864704757
foo,host=bob,system=b,workload=y value=42 1548283698947241533
# output by aggregator
foo,system=b,workload=y value_sum=84 1548283706000000000

Even if this is the same, we should at least document this technique because it isn't very obvious.

@danielnelson danielnelson reopened this Jan 23, 2019
@puckpuck
Copy link
Contributor

puckpuck commented Jan 24, 2019

Played with this some more, and it turns out that taginclude does produce what you are looking for.

One side effect, though I'm not sure this is intentional is that I also happen to get a single metric from the aggregator which is an aggregation of all metrics that did not include the tags in my taginclude list.

I'm happy to take a stab at cleaning up the doc mentioned above to showcase how taginclude can be leveraged to create dynamic groups with aggregator plugins.

(edited because i confused taginclude with tagpass... smh)

@danielnelson
Copy link
Contributor

I'm happy to take a stab at cleaning up the doc mentioned above to showcase how taginclude can be leveraged to create dynamic groups with aggregator plugins.

Thanks that would be great! Sadly, taginclude is the equivalent of fielddrop, which I find very confusing.

@danielnelson
Copy link
Contributor

Closing with the new documentation, also I have something different that we might want to use the pivot name for: #5629.

@danielnelson danielnelson added the discussion Topics for discussion label Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Topics for discussion
Projects
None yet
Development

No branches or pull requests

3 participants