-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] Support back-pressure from downstream consumers #20511
Comments
Pinging code owners for pkg/stanza: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@dmitryax one related thing I would like to decide is if that configuration option needs to be per receiver or we want it to be per pipeline, for example as a memory_limiter processor config option? If it is memory_limiter option then memory_limiter can easily make the decision to drop the data and return "success" to the caller. If the option is to retry then memory_limiter should return non-permanent error to the caller, preferably a distinct non-permanent error that indicates the retrying should not be immediate, but after a while (since in memory limited situations we want receivers to slow down). It may be also worth looking into defining a well-known RetryableError type which also carries RetryAfter hint. This can be used both by memory_limiter and by exporterqueue/exporter. Some exporters can receive "retry hints" from their destinations and it would nice to propagate this hint all the way back to the receivers, which in turn can propagate it back to their senders (which is e.g. part of otlp protocol throttling) or wait themselves (which may be more applicable to receivers like filelog which can just pause and then retry and continue from the same place in the file). |
Contributes to open-telemetry#1084 - Clarify what the memory limiter does. - Set expectations from receivers, how they are supposed to react when the memory limiter refuses the data. All receivers must adhere to this contract. See for example an issue opened against filelog receiver: open-telemetry/opentelemetry-collector-contrib#20511 Future work: one additional thing we can do is implement a backoff logic in the memory limiter. When in memory limited mode the processor can introduce pauses before it returns from the ConsumeLogs/Traces/Metrics call. This will allow to slow down the inflow of data into the Collector and give time for the pipeline to clear and memory usage to return to the normal. This needs to be explored further.
Contributes to open-telemetry#1084 - Clarify what the memory limiter does. - Set expectations from receivers, how they are supposed to react when the memory limiter refuses the data. All receivers must adhere to this contract. See for example an issue opened against filelog receiver: open-telemetry/opentelemetry-collector-contrib#20511 Note that there are no functional changes to the memory limiter. Future work: one additional thing we can do is implement a backoff logic in the memory limiter. When in memory limited mode the processor can introduce pauses before it returns from the ConsumeLogs/Traces/Metrics call. This will allow to slow down the inflow of data into the Collector and give time for the pipeline to clear and memory usage to return to the normal. This needs to be explored further.
Contributes to open-telemetry#1084 - Clarify what the memory limiter does. - Set expectations from receivers, how they are supposed to react when the memory limiter refuses the data. - Add a test that demonstrates that memory limiter does not lose data if the receiver and exporter behave according to the contract. All receivers must adhere to this contract. See for example an issue opened against filelog receiver: open-telemetry/opentelemetry-collector-contrib#20511 Note that there are no functional changes to the memory limiter. Future work: one additional thing we can do is implement a backoff logic in the memory limiter. When in memory limited mode the processor can introduce pauses before it returns from the ConsumeLogs/Traces/Metrics call. This will allow to slow down the inflow of data into the Collector and give time for the pipeline to clear and memory usage to return to the normal. This needs to be explored further.
Contributes to open-telemetry#1084 - Clarify what the memory limiter does. - Set expectations from receivers, how they are supposed to react when the memory limiter refuses the data. - Add a test that demonstrates that memory limiter does not lose data if the receiver and exporter behave according to the contract. All receivers must adhere to this contract. See for example an issue opened against filelog receiver: open-telemetry/opentelemetry-collector-contrib#20511 Note that there are no functional changes to the memory limiter. Future work: one additional thing we can do is implement a backoff logic in the memory limiter. When in memory limited mode the processor can introduce pauses before it returns from the ConsumeLogs/Traces/Metrics call. This will allow to slow down the inflow of data into the Collector and give time for the pipeline to clear and memory usage to return to the normal. This needs to be explored further.
…etry#7459) Contributes to open-telemetry#1084 - Clarify what the memory limiter does. - Set expectations from receivers, how they are supposed to react when the memory limiter refuses the data. - Add a test that demonstrates that memory limiter does not lose data if the receiver and exporter behave according to the contract. All receivers must adhere to this contract. See for example an issue opened against filelog receiver: open-telemetry/opentelemetry-collector-contrib#20511 Note that there are no functional changes to the memory limiter. Future work: one additional thing we can do is implement a backoff logic in the memory limiter. When in memory limited mode the processor can introduce pauses before it returns from the ConsumeLogs/Traces/Metrics call. This will allow to slow down the inflow of data into the Collector and give time for the pipeline to clear and memory usage to return to the normal. This needs to be explored further.
I think the memory limiter should not automatically drop itself and instead provide more specific error messages such as RetryableError, as you suggested. My reasoning is that the decision to drop or retry can significantly vary among different receivers. Even the same receiver type can have different modes of operation, such as the filelog receiver which may choose to drop certain application logs but never miss any audit logs. Given that multiple receivers are typically used in a single pipeline, setting this configuration on the memory limiter would require splitting the pipeline. |
Currently, we drop logs on any error from the downstream consumer. This behavior is not desired in case of temporary errors. Users might want to slow down reading the log files instead of dropping the data.
We should provide a way to have such behavior that a configuration option can control.
The text was updated successfully, but these errors were encountered: