Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove outputs blocking inputs when output is slow #4938

Merged
merged 28 commits into from
Nov 5, 2018
Merged

Conversation

danielnelson
Copy link
Contributor

@danielnelson danielnelson commented Oct 30, 2018

This pull request remove blocking during flushes and writes immediately to the metric buffer. This should improve throughput especially when outputs are slow, and degrade better when overloaded.
closes #2919

To avoid over consumption in queue consumers it introduces a way to track metrics and be notified when they are delivered, allowing for durable handling in the consumer inputs. This is done by reference counting the metrics as they move through Telegraf which does add some additional implementation requirements to processors and consumer inputs.
closes #3984

The agent was refactored to fix issues restarting and shutting down when under load:
closes #4283
closes #4457
closes #4610

As part of the agent refactor, the timing alignment is fixed:
closes #3968

Add per output flush_interval, metric_buffer_limit and metric_batch_size.
closes #4717

What still needs done:

  • Update contributing documentation WRT tracking data.
  • Add tracking support to mqtt_consumer
  • Add tracking support to nats_consumer
  • Add tracking support to nsq_consumer

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

accumulator.go Show resolved Hide resolved
Write(metrics []Metric) error

// Start the "service" that will provide an Output
Start() error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do service plugins replace Start and Stop implementations with Connect and Close?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not sure what the point of this extra interface was, only the prometheus output was a ServiceOutput.

@danielnelson danielnelson force-pushed the flush-no-block branch 2 times, most recently from 0d8c445 to b44090b Compare November 1, 2018 23:18
@danielnelson danielnelson removed the wip label Nov 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment