Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporterhelper support for back-pressure #8245

Open
jmacd opened this issue Aug 16, 2023 · 4 comments · May be fixed by #11248
Open

exporterhelper support for back-pressure #8245

jmacd opened this issue Aug 16, 2023 · 4 comments · May be fixed by #11248

Comments

@jmacd
Copy link
Contributor

jmacd commented Aug 16, 2023

Is your feature request related to a problem? Please describe.

For an exporter that uses exporterhelper, I have the following requirements.

  • When a data item is consumed by the exporter, the exporter should have an option to block until the exporter succeeds
  • When a data item is consumed by the exporter and the request fails, the exporter should return the error to the caller

These two requirements need to be met whether or not there are retries enabled, and whether or not there there is a queue, meaning whether or not QueueSettings.Enabled is true. The problem is that when QueueSettings.Enabled is true, which enables parallelism (i.e., NumConsumers > 1), then the caller will not get the error from the downstream receiver and the caller has no way to block until the receiver returns.

Here is where today's exporterhelper w/ QueueSettings.Enabled==false will block until the call returns and return the caller's error:

err := qrs.consumerSender.send(req)

Here is where, when QueueSettings.Enabled==true and the queue is full, the code will not block and returns a generic error immediately, instead.

Here is where, when QueueSettings.Enabled==true and the queue is not full, the code returns a nil without waiting to hear from the downstream receiver.

As it stands, the only way for an exporter to block on the downstream service and receive its responses synchronously is to disable queueing. When Queueing is enabled, there are two problems (a) no option to block on queue-full, (b) no option to wait on response.

Describe the solution you'd like
Optional behavior that enables blocking on a queue full and waiting for a response.

Describe alternatives you've considered
No alternatives come to mind.

Additional context
The OTel-Arrow project has a developed a bridging-mode for OpenTelemetry Collectors that enables high-compression on the network link between two collectors. For this setup to work reliably and under high load, we need the otelarrow exporter to be configured with a blocking pipeline. We would recommend both of the optional behaviors described in this issue to otelarrow users.

@dmitryax
Copy link
Member

I'm not sure I fully understand the ask here. Enabling the queue introduces asynchronous behavior by design. Why do you need the queue if there is a requirement for the blocking/synchronous pipeline? What's the purpose of the queue in that case?

@jmacd
Copy link
Contributor Author

jmacd commented Aug 17, 2023

I am more-or-less requesting a feature that I would expect from the batchprocessor.

As we expect to move batchprocessor functionality into the exporterhelper, I expect queue will be a place where requests stand waiting to be batched and sent. When I disable the queue, there is no limit on the number of concurrent senders and no possibility of batching.

When I enable the queue, I expect to get batching. More importantly, I put a limit on the number of outgoing RPCs (i.e., NumConsumers) which can be important in controlling load balance.

@jmacd
Copy link
Contributor Author

jmacd commented Aug 17, 2023

I will put this another way. I believe the batching facility should be independent of the facility for backpressure.

When the exporterhelper has the backpressure option I'm looking for and queueing enabled, the caller has to stick around until each data item is processed and return the error (as well as block on the caller's context timeout).

I've been working on the OTel Arrow project, which has gRPC-streaming exporter which does the same sort of thing. For each incoming request, it allocates a make(chan error, 1) to receive the response, places that channel into the queue along with the data, and then falls into a select awaiting either the response or the timeout.

See the code: https://github.com/open-telemetry/otel-arrow/blob/95211c5a139c84f7117905531dd45e2122778f06/collector/exporter/otelarrowexporter/internal/arrow/stream.go#L423

@jmacd
Copy link
Contributor Author

jmacd commented Jun 18, 2024

I am happy with the latest developments in the batching support. We still require the queue sender to enable concurrency in the batch processor => so we have back-pressure but with unacceptable concurrency loss. Therefore, I will leave this open. @moh-osman3 is looking into a concurrency sender option that would let us replace the concurrentbatchprocessor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants