Skip to content

Concurrent flush across batches #3192

Closed
@codebien

Description

@codebien

What

Move the concurrency for flushing metrics from per-flush to per-batch.

The expected architecture is one goroutine doing the following operations:

  • Fetch the buckets from the buckets queue
  • Split time series into batches
    • Encoding as protobuf
    • Enqueue the batch as a job to be pushed to the remote service

And a series of concurrent goroutines doing the following operations:

  • Fetch a job
  • Invoke the metricsClient.push operation

Why

We have seen not optimal handling when we hit tests with lot of active time series (> 100k). The flush operation will split them in batches and then pushes them sequentially, doing some math like the following, it is to see why we could hit some >10s per single flush operation.

Example

100k time series
1k time series as batch limit

that generates 100 batches

in the case, we don't have perfect networking (e.g 100ms per request) then we will end with a total of 10 seconds for flushing a single iteration of 100k active series (100 batches * 100 ms), and it can even grow with worst cases.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions