Async/callback based interface between Span/Log Processor and Span/Log Exporter #1239

lalitb · 2022-03-01T20:40:12Z

As of now, the flow of spans from Span Processor to Exporter is synchronous, which means the processor has to wait for the upload/export status of the ongoing request before invoking a new request.

Application -> Span::OnEnd() -> SimpleProcessor::OnEnd -> Mutex Lock -> Exporter::Export()

The application has to wait for the span to be uploaded to continue further with business logic.

The Batch processor simplifies this by batching/caching the spans in the processor and uploading them in a separate thread.

Application -> Span::OnEnd() -> BatchProcessor::OnEnd() -> CircularBuffer:Add(span)
            -> Span::OnEnd() -> BatchProcessor::OnEnd() -> CircularBuffer:Add(span)
            -> Span::OnEnd() -> BatchProcessor::OnEnd() -> CircularBuffer:Add(span)   
                                        ProcessorThread-> CircularBuffer::Drain() -> Export::Export()
Application -> Span::OnEnd() -> BatchProcessor::OnEnd() -> CircularBuffer:Add(span)

This allows the Application to continue processing business logic without waiting for spans to export, and also Span::OnEnd() will not block even when export is ongoing in a separate thread ( there is no lock due to the use of CircularBuffer to store Spans).

With BatchProcessor, while the interface between application and processor is fast (no wait), the interface between processor and exporter is still synchronous. As the processor needs to wait for the ongoing span(s) export to finish before invoking new export. This causes a bottleneck if the application is creating spans faster than the time taken by the exporter to export the spans, and eventually, the CircularBuffer gets full. Once the CircularBuffer gets full, Span::OnEnd() needs to wait/block on the buffer to be available for writing.

The current design for the interface between processor and exporter is driven by specification as below

Export() will never be called concurrently for the same exporter instance. Export() can be called again only after the current call returns.
Success - The batch has been successfully exported. For protocol exporters this typically means that the data is sent over the wire and delivered to the destination server.
Note: this result may be returned via an async mechanism or a callback, if that is idiomatic for the language implementation.

So, the solution could be to return the exporter::Export() result as a callback or async mechanism,

Async mechanism: C++ provide async mechanism using std::async/std::promise. It would be good to do more research if its is suitable for more complex scenarios, and whether all the compiler provide a stable implementation for it before deciding on it ( Good read - https://www.cppstories.com/2014/01/tasks-with-stdfuture-and-stdasync/ )
Callback mechanism: would be much simpler something like:

     bool Exporter:Export(const nostd::span<std::unique_ptr<Recordable>> &spans,
          nostd::function_ref<bool(ExportResult)> result_callback);

This allows the exporter to invoke the result_callback when the export status is available, and the processor to invoke the export without waiting for the ongoing export to complete.

Also, the interface change should be backward compatible, and shouldn't break the existing/external exporters. One option would be to have the processor decide which export mechanism to use based on the configurable parameter. The default should be as existing ( i.e, synchronous export).

The text was updated successfully, but these errors were encountered:

lalitb · 2022-03-01T21:10:53Z

To start with, this new interface can be quickly integrated with Zipkin, and OTLP HTTP exporter, as the existing ext::http::client support callback mechanism.

owent · 2022-03-02T01:53:21Z

This is also for Log Processor and Log Exporter.

DebajitDas · 2022-03-03T09:53:03Z

Hi @lalitb, I would like to explore on this, obviously with your guidance if this is ok.

lalitb · 2022-03-03T17:43:31Z

Thanks, @DebajitDas . Have assigned it to you now. Feel free to ask any questions.

lalitb · 2022-03-03T19:57:43Z

@DebajitDas Also, you may want to look into a related PR #63, which was an attempt to add async support using libevent. The PR adds support for files i/o and timers and networking was to follow. The PR was never merged for various reasons.

JFYI, Feel free to decide on the best approach on adding the async support :)

DebajitDas · 2022-03-04T11:45:55Z

Thanks @lalitb . I am looking at C++ async mechanism and see if that can be used to achieve asynchronous Export Call.
I looked at http::client callback mechanism, but that seems to be blocking call, even though it was implemented with async mechanism. Am I missing something?

owent · 2022-03-04T12:22:11Z

I create another issue #1243 to implement async send for opentelemetry::ext::http::client::curl::HttpClient. But it's just for curl and http client.
To my understanding, there is no standard async mechanism in C++, select() can be used on all platforms we support but it has performence problem and it can not handle more than 1024 file descriptors(#1220 ) . Many libraries and system use libevent, libuv or [boost.asio] as their event poller. (libcurl support use both libevent and libuv or use internal select() as fallback.). hiredis support more event management libraries or frameworks.

Some framework implement their own event and timer management(e.g. redis). It need to implement different poller for different system, IOCP for Windows, epoll for linux(maybe io_uring for modern linux kernel), kqueue for BSD, evport for Solaris and etc.

I'm thinking to use curl_multi_poll as the async mechanism in opentelemetry::ext::http::client::curl::HttpClient. Of course, It's better if we have a our own event system and all modules can use the same async mechanism.

lalitb · 2022-03-04T21:51:19Z

I looked at http::client callback mechanism, but that seems to be blocking call, even though it was implemented with async mechanism. Am I missing something?

@DebajitDas You are right. The abstract method http::client::Session::SendRequest takes a callback function as an argument, and it's curl implementation http::client::curl::Session::SendRequest is synchronous i.e, this method returns only after request is completed and callback function is called. This should probably change once @owent implements #1243. But we can take full advantage of this change only once the Exporter::export() also takes a callback as an argument. Also once we support multiple HTTP clients along with curl (#1145 and #1146), these clients may be supporting request handling through polling and it would be good if the SDK is already prepared to handle that.

To my understanding, there is no standard async mechanism in C++

@owent agree on this. Any async mechanism requires an event loop to run, and this is missing in C++. std::async potentially uses a separate thread to run the operation which is not a true async mechanism. That is the reason PR #63 to use libevent should be something we can think about. It would be also useful to use this for metrics implementation, where we need to poll for metrics from multiple instruments simultaneously. What are your thoughts on using libevent?

owent · 2022-03-05T04:45:02Z

What are your thoughts on using libevent?

In my opnion, we should let the user decide which event loop to use. Should we also implement a adaptor just like hiredis? And we can use libevent by default.

owent · 2022-03-07T02:40:48Z

I also use libwebsockets in our project, it alse support to use libuv , libevent, libev, libubox/uloop and etc. as event loop. I think libuv and libevent is widely used now.

lalitb · 2022-03-07T04:16:57Z

In my opnion, we should let the user decide which event loop to use. Should we also implement a adaptor just like hiredis? And we can use libevent by default.

This is also somewhat what #63 is trying to achieve. Provides an abstract interface, with libevent implementation.

DebajitDas · 2022-03-07T05:21:48Z

From the interface perspective of Processor ---> Exporter, callback mechanism could be achieved by the following:

Using std::async and std::shared_future: On every call to Export, Exporter would use std::async (std::launch::async) to launch a separate thread and perform the export work there. And when the result is ready, invoke the callback.
std::shared_future can be stored in queue and a separate worker thread would run in exporter to fetch the shared_future to wait on the task to complete.
Caveat: Every call to export creates a new thread, hence there might be performance penalty and system thread limit might be reached.
Also, according to this there is difference in implementation of GCC/LLVM and MSVC compilers. MSVC compiler uses threads from windows thread pool, therefore should carefully use thread_local variable as they might not be initialised.
The other option would be to launch thread_pool in the exporter, where worker threads would pick up its work and perform export and would invoke the callback on export completion.
And the number of threads could be kept configurable. In case libevent is used, I think one worker thread is sufficient.

I might be wrong in my understanding and not at par with the context. Please correct me if I am missing something.

lalitb · 2022-03-07T06:59:13Z

I would prefer to offload most of the complexity of thread/concurrency management to Span/Log Processor and let the exporter logic be focused on the serialization and transport. This will also ease the work for exporter developers.

If we can scope this issue for only interface change, and think about concurrency separately, the rough changes for ElasticsearchLogExporter would be (not tested):

class ResponseHandler : public http_client::EventHandler
{
public:
  /**
   * Creates a response handler, that by default doesn't display to console
   */
  ResponseHandler(bool console_debug = false, 
        nostd::function_ref<bool(ExportResult)> result_callback, 
        std::shared_ptr<Session> &&session ) : 
	console_debug_{console_debug}, session_{std::move(session)}
  {
  }

  /**
   * Automatically called when the response is received, store the body into a string and notify any
   * threads blocked on this result
   */
  void OnResponse(http_client::Response &response) noexcept override
  {
	// Log error if response body has error
	session->FinishSession();
      result_callback(sdk::common::ExportResult::kSuccess);
	
  }

  void OnEvent(http_client::SessionState state, nostd::string_view reason) noexcept override
 {
    //Log reason / state
     session->FinishSession();
     result_callback(sdk::common::ExportResult::kFailure)
 }
};


void ElasticsearchLogExporter::Export(
    const nostd::span<std::unique_ptr<sdklogs::Recordable>> &records,
nostd::function_ref<bool(ExportResult)> result_callback) noexcept
{

  // Create a connection to the ElasticSearch instance
  auto session = http_client_->CreateSession(options_.host_ + std::to_string(options_.port_));
  auto request = session->CreateRequest();
  request->SetUri(options_.index_ + "/_bulk?pretty");
  request->SetMethod(http_client::Method::Post);
  request->AddHeader("Content-Type", "application/json");
  request->SetTimeoutMs(std::chrono::milliseconds(1000 * options_.response_timeout_));

   body_vec = serialise(records); 
  request->SetBody(body_vec);
  std::unique_ptr<ResponseHandler> handler(new ResponseHandler(options_.console_debug_, result_callback, session));
  session->SendRequest(*handler);

}

Again, would be good to have more thoughts before finalizing, and also can discuss in the community meeting.

owent · 2022-03-07T07:12:27Z

I would prefer to offload most of the complexity of thread/concurrency management to Span/Log Processor and let the exporter logic be focused on the serialization and transport.

Agree. Can we also provide a internal event loop and let the exporters decide whether to use it or spawn a new thread to finish
IO?

lalitb · 2022-03-07T07:49:12Z

Agree. Can we also provide a internal event loop and let the exporters decide whether to use it or spawn a new thread to finish
IO?

Yes, we definitely need to have an internal event loop. In the case of HTTP exporter, the decision of using the event loop for IO should be with ext::http::client implementation. So if the ext::http::curl want to use curl_multi_poll it should use the event loop if available. Do you think it would be beneficial for non-HTTP exporters?

owent · 2022-03-07T07:59:51Z

Yes, we definitely need to have an internal event loop. In the case of HTTP exporter, the decision of using the event loop for IO should be with ext::http::client implementation. So if the ext::http::curl want to use curl_multi_poll it should use the event loop if available. Do you think it would be beneficial for non-HTTP exporters?

Yes. And if other exporters use any SDK which allow users to get the raw socket fd. We can also use our internal event loop to handle it.

DebajitDas · 2022-03-07T12:09:22Z

std::unique_ptr<ResponseHandler>

@lalitb Thanks for clarifying the scope of this issue. With the above, we need to maintain such that RequestHandler object remains valid even when callback is called in a separate thread.

I have these follow up questions now:

Should we maintain two different export versions - both with and without callback?
Processor decides which version to call depending on configurable parameters.

lalitb · 2022-03-07T22:07:45Z

With the above, we need to maintain such that RequestHandler object remains valid even when callback is called in a separate thread.

It's a good point. Probably we need to pass the ownership of ResultHandler from exporter to HttpClient, and let it clean up after calling. This would need changes in HTTP client library

Should we maintain two different export versions - both with and without callback?

Yes, we need to do that for backward compatibility. Though maintaining ABI compatibility would be tricky, as any external exporter (built against the old SDK) would no longer work with the new SDK, and would need recompilation.

Processor decides which version to call depending on configurable parameters.

True.

DebajitDas · 2022-03-08T13:40:50Z

Thanks @lalitb for clarifying the doubts.
Now from the implementation perspective, we need to provide implementation for those exporters which uses http_client_curl internally, such as elastic_log_exporter, otlp_http_exporter, otlp_http_log_exporter and zipkin exporter. All other log/span exporter would have empty implementation in the callback version of export call?

owent · 2022-03-08T14:17:11Z

After the new Export interface is done. I can modify #1209 to make otlp_http_exporter and otlp_http_log_exporter to use the new callback version Export call then.

lalitb · 2022-03-08T17:03:05Z

Now from the implementation perspective, we need to provide implementation for those exporters which uses http_client_curl internally, such as elastic_log_exporter, otlp_http_exporter, otlp_http_log_exporter and zipkin exporter. All other log/span exporter would have empty implementation in the callback version of export call?

If it is possible to start with one of the exporters say Zipkin that would be good. Zipkin uses HTTP sync client as of now, so good use-case to convert that to use HTTP Async client, and see how it works. But if we are modifying the HTTPClient library ( say to transfer ownership of EventHandler, and manage callbacks), this may require changes in other exporters.

lalitb · 2022-03-08T18:03:19Z

Agree. Can we also provide a internal event loop and let the exporters decide whether to use it or spawn a new thread to finish

Have created #1250 for supporting the event loop framework. Feel free to modify the issue, or add more data there.

DebajitDas · 2022-03-30T05:32:11Z

As pointed out in [https://github.com/open-telemetry/opentelemetry-specification/issues/2434#issuecomment-1076997385]
I wanted to have the discussion around "max_async_export" which would be configurable parameter.

What should be the default value of this variable?
What should be the behaviour if the value is set as Zero (0) ? Since this holds no meaning, should this is re-set to default.
What should be the max value allowed for this parameter?

owent · 2022-03-30T05:45:01Z

As pointed out in [https://github.com/open-telemetry/opentelemetry-specification/issues/2434#issuecomment-1076997385] I wanted to have the discussion around "max_async_export" which would be configurable parameter.

What should be the default value of this variable?

What should be the behaviour if the value is set as Zero (0) ? Since this holds no meaning, should this is re-set to default.

What should be the max value allowed for this parameter?

I use the variable name max_concurrent_requests which is mentioned in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md#otlpgrpc-concurrent-requests in OtlpHttpExporter and OtlpHttpLogExporter now. But I didn't find the specification about the default value. Current default value ( 8 ) in OtlpHttpExporter and OtlpHttpLogExporter is just based on my local benchmark and may cost about 70% CPU time at most without droping data.It may take less cost after we use multi_handle to implement curl::HttpClient in the future.

What's your suggestion about the default value?

DebajitDas · 2022-03-30T06:14:24Z

What's your suggestion about the default value?

Yes, I had that in mind about the default value used in max_concurrent_requests and to start with we can go with 8 as default value.

github-actions · 2022-05-30T02:14:18Z

This issue was marked as stale due to lack of activity. It will be closed in 7 days if no furthur activity occurs.

owent · 2022-07-09T07:39:47Z

According to the latest https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md , Export() will never be called concurrently.
Could this issue be closed as #1413 is already merged?

lalitb · 2022-07-11T05:44:49Z

#1413 fixes this.

lalitb added area:exporter enhancement New feature or request good first issue Good for newcomers help wanted Good for taking. Extra help will be provided by maintainers and removed help wanted Good for taking. Extra help will be provided by maintainers labels Mar 1, 2022

lalitb changed the title ~~Async/callback based interface between Span Processor and Span Exporter~~ Async/callback based interface between Span/Log Processor and Span/Log Exporter Mar 2, 2022

lalitb assigned DebajitDas Mar 3, 2022

DebajitDas mentioned this issue Mar 9, 2022

Added changes for async callback mechanism from Processor to Exporter #1252

Closed

3 tasks

DebajitDas mentioned this issue Apr 4, 2022

Added max async export support using separate AsyncBatchSpan/LogProcessor #1306

Merged

3 tasks

github-actions bot added the Stale label May 30, 2022

lalitb added do-not-stale and removed Stale labels May 30, 2022

lalitb closed this as completed Jul 11, 2022

tsloughter mentioned this issue Sep 24, 2024

Clarify spans export concurrency open-telemetry/opentelemetry-specification#4205

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async/callback based interface between Span/Log Processor and Span/Log Exporter #1239

Async/callback based interface between Span/Log Processor and Span/Log Exporter #1239

lalitb commented Mar 1, 2022 •

edited

Loading

lalitb commented Mar 1, 2022

owent commented Mar 2, 2022

DebajitDas commented Mar 3, 2022

lalitb commented Mar 3, 2022

lalitb commented Mar 3, 2022 •

edited

Loading

DebajitDas commented Mar 4, 2022

owent commented Mar 4, 2022

lalitb commented Mar 4, 2022 •

edited

Loading

owent commented Mar 5, 2022

owent commented Mar 7, 2022

lalitb commented Mar 7, 2022 •

edited

Loading

DebajitDas commented Mar 7, 2022

lalitb commented Mar 7, 2022 •

edited

Loading

owent commented Mar 7, 2022

lalitb commented Mar 7, 2022 •

edited

Loading

owent commented Mar 7, 2022

DebajitDas commented Mar 7, 2022 •

edited

Loading

lalitb commented Mar 7, 2022

DebajitDas commented Mar 8, 2022

owent commented Mar 8, 2022

lalitb commented Mar 8, 2022

lalitb commented Mar 8, 2022

DebajitDas commented Mar 30, 2022

owent commented Mar 30, 2022 •

edited

Loading

DebajitDas commented Mar 30, 2022

github-actions bot commented May 30, 2022

owent commented Jul 9, 2022

lalitb commented Jul 11, 2022

Async/callback based interface between Span/Log Processor and Span/Log Exporter #1239

Async/callback based interface between Span/Log Processor and Span/Log Exporter #1239

Comments

lalitb commented Mar 1, 2022 • edited Loading

lalitb commented Mar 1, 2022

owent commented Mar 2, 2022

DebajitDas commented Mar 3, 2022

lalitb commented Mar 3, 2022

lalitb commented Mar 3, 2022 • edited Loading

DebajitDas commented Mar 4, 2022

owent commented Mar 4, 2022

lalitb commented Mar 4, 2022 • edited Loading

owent commented Mar 5, 2022

owent commented Mar 7, 2022

lalitb commented Mar 7, 2022 • edited Loading

DebajitDas commented Mar 7, 2022

lalitb commented Mar 7, 2022 • edited Loading

owent commented Mar 7, 2022

lalitb commented Mar 7, 2022 • edited Loading

owent commented Mar 7, 2022

DebajitDas commented Mar 7, 2022 • edited Loading

lalitb commented Mar 7, 2022

DebajitDas commented Mar 8, 2022

owent commented Mar 8, 2022

lalitb commented Mar 8, 2022

lalitb commented Mar 8, 2022

DebajitDas commented Mar 30, 2022

owent commented Mar 30, 2022 • edited Loading

DebajitDas commented Mar 30, 2022

github-actions bot commented May 30, 2022

owent commented Jul 9, 2022

lalitb commented Jul 11, 2022

lalitb commented Mar 1, 2022 •

edited

Loading

lalitb commented Mar 3, 2022 •

edited

Loading

lalitb commented Mar 4, 2022 •

edited

Loading

lalitb commented Mar 7, 2022 •

edited

Loading

lalitb commented Mar 7, 2022 •

edited

Loading

lalitb commented Mar 7, 2022 •

edited

Loading

DebajitDas commented Mar 7, 2022 •

edited

Loading

owent commented Mar 30, 2022 •

edited

Loading