Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate Stackdriver output requests #5404

Closed
Legogris opened this issue Feb 11, 2019 · 13 comments
Closed

Aggregate Stackdriver output requests #5404

Legogris opened this issue Feb 11, 2019 · 13 comments
Labels
area/gcp Google Cloud plugins including cloud_pubsub, cloud_pubsub_push, stackdriver bug unexpected problem or unintended behavior
Milestone

Comments

@Legogris
Copy link
Contributor

Feature Request

Stackdriver currently send a single request for each TimeSeries in a write. This will hit Stackdriver rate limits pretty early. The feedback we got from Google was that we were abusing their API using the current version.

Proposal:

Aggregate separate TimeSeries in batch requests

Current behavior:

Each TimeSeries is sent as a single request:

timeSeriesRequest := &monitoringpb.CreateTimeSeriesRequest{

Desired behavior:

TimeSeries are aggregated together. The reason this is not as simple as just aggregating all TimeSeries passed to write is that if more than one of these contain points for the same TimeSeries, this will yield an error: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request. – refers to an attempt to write more than one point to a single time series in the same request, which is not allowed.

It should also be noted that each request can contain a maximum of 200 TimeSeries.

@danielnelson danielnelson added bug unexpected problem or unintended behavior area/gcp Google Cloud plugins including cloud_pubsub, cloud_pubsub_push, stackdriver labels Feb 11, 2019
@danielnelson
Copy link
Contributor

It seems that we first need to group all metrics by minute, and then aggregate the metrics. Do you have any thoughts as to what aggregation would be most appropriate? We may want to just do a "last" aggregation to send the most recent point.

Just to clarify with respect to the 200 time series per request, you are suggesting that we pack the full 200 time series into the request before calling CreateTimeSeries instead of the current strategy of sending one telegraf.Metric per call?

@Legogris
Copy link
Contributor Author

Legogris commented Feb 11, 2019

It seems that we first need to group all metrics by minute, and then aggregate the metrics. Do you have any thoughts as to what aggregation would be most appropriate? We may want to just do a "last" aggregation to send the most recent point.

I'm really not an expert here, but is there anything special about minutes here?

Just to clarify with respect to the 200 time series per request, you are suggesting that we pack the full 200 time series into the request before calling CreateTimeSeries instead of the current strategy of sending one telegraf.Metric per call?

Ideally each request should contain as many TimeSeries as possible, just wanted to put it in case an implementer misses this part and attempts to send over 200 TimeSeries per request.

@Legogris
Copy link
Contributor Author

I made a first attempt in #5407, happy to hear your thoughts on the best way to go about this @danielnelson .

@danielnelson
Copy link
Contributor

In the quotas documentation I found:

Rate at which data can be written to a single time series: one point per minute

@Legogris
Copy link
Contributor Author

Legogris commented Feb 11, 2019

In the quotas documentation I found:

Rate at which data can be written to a single time series: one point per minute

Good point. For users, this can be solved by just setting interval/flushInterval appropriately to >=60s, but I guess there are people who need more granularity.

@danielnelson
Copy link
Contributor

It would be nice if we could have the inputs set to a lower interval, and the output would still behave nicely. This would be handy when sending to multiple outputs, and example being sending some data to stackdriver but most of the data to InfluxDB, which is the configuration I expect most people will use this plugin.

We had a similar issue with the Azure Monitor output, where the data needs to be sent at a lower resolution, however Azure Monitor has support for aggregate values without needing to use a distribution data type. I think in the case of Stackdriver, it may make sense to do something similar but always sending the most recent value each minute.

@Legogris
Copy link
Contributor Author

Legogris commented Feb 11, 2019

One thought:

  1. Construct TimeSeries from fields (order does not matter)
  2. Group TimeSeries by hash (metric name / field name / tag values)
  3. Merge duplicates into points on same TimeSeries
  4. Sort TimeSeries by time
  5. Chunk into requests of maximum 200.

Thoughts on this approach?

@danielnelson
Copy link
Contributor

I think that's close, but also we should keep in mind that you can only send one point per timestamp per request. We want to merge points that occur during in the same minute but if they are further apart we want to send them in another request.

@Legogris
Copy link
Contributor Author

@danielnelson It's the other way around, right?
Within the same minute => aggregate to single point or send in separate request
Outside minute => Add as point to same timeseries, in the same request

@danielnelson
Copy link
Contributor

Not in the same request though, from the same link as before:

You can write only one data point for each time series in a request

@Legogris
Copy link
Contributor Author

Right, I forgot about that. Then it shouldn't be that tricky to get something working, revised from above:

  1. Construct TimeSeries from fields (order does not matter)
  2. Group TimeSeries by hash (metric name / field name / tag values)
  3. Construct requests by picking first the first element of each group, then the second, etc. New request when looping back to same hash or limit of 200 is reached.

@Legogris
Copy link
Contributor Author

#5407 is now updated by @nicbaz according to the proposal above.

@danielnelson danielnelson added this to the 1.10.0 milestone Feb 20, 2019
@danielnelson
Copy link
Contributor

Closed in #5407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gcp Google Cloud plugins including cloud_pubsub, cloud_pubsub_push, stackdriver bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants