-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove queue from the Prometheus remote write #2951
Conversation
Prometheus remote write (PWR) exporter is supporting queued retry out of the box, but due to the strict requirements of PWR, the queue causes "out of order sample" errors. The queue won't be suitable for Prometheus unless it has a way to shard the data by timeseries. Fixes #2949.
Codecov Report
@@ Coverage Diff @@
## main #2951 +/- ##
==========================================
- Coverage 91.68% 91.68% -0.01%
==========================================
Files 312 312
Lines 15339 15337 -2
==========================================
- Hits 14063 14061 -2
Misses 870 870
Partials 406 406
Continue to review full report at Codecov.
|
cc @odeke-em |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @rakyll! I added a suggestion but also would be nice to add the issue you filed, as a reference to the code comments. Thank you.
Kindly cc-ing @bogdandrutu! |
Added a link to the issue, thanks for the suggestion @odeke-em. |
Nice, thank you @rakyll! LGTM. |
Can we instead force queued_retry to have a single consumer for this exporter? That should fix the problem while keeping the retrying and queuing capabilities, right? I am not sure how important is that for this exporter though. |
@tigrannajaryan Retrying is critical in the context of remote write, it's not easy to have a highly available Prometheus, Cortex or Thanos. We want to reimplement the retrying options Prometheus server implements natively (https://prometheus.io/docs/practices/remote_write/) to provide a drop-in replacement and help users to reuse their fine tuned configuration. |
Sounds good. Do you also want to add a TODO or an issue (if it does not already exist) describing what you plan to do? |
I updated https://github.com/open-telemetry/opentelemetry-collector/issues/2259 to make sure we're following up with the necessary changes. Thanks much. |
…2974) * Enable queue for the Prometheus Remote Write Exporter internally This is a follow up to #2951. When we disable the queue completely, it causes the export to happen not in a consumer goroutine. Enable the queue internally that export gets its own goroutine not to block the entire collector pipeline. * Set a default queue size
* Remove sending_queue from AWS Prometheus Remote Write Exporter This is a follow up of open-telemetry/opentelemetry-collector#2951. Fixes #3163. * Fix the test
…telemetry#3186) * Remove sending_queue from AWS Prometheus Remote Write Exporter This is a follow up of open-telemetry/opentelemetry-collector#2951. Fixes open-telemetry#3163. * Fix the test
Prometheus remote write (PWR) exporter is supporting queued retry
out of the box, but due to the strict requirements of PWR, the queue
causes "out of order sample" errors. The queue won't be suitable
for Prometheus unless it has a way to shard the data by timeseries.
This is not going to break the existing users because if they have
this feature enabled, they already can't use the PRW exporter correctly.
Fixes #2949.