Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should exports be async #1422

Closed
huntc opened this issue Jul 16, 2020 · 36 comments
Closed

Should exports be async #1422

huntc opened this issue Jul 16, 2020 · 36 comments

Comments

@huntc
Copy link
Contributor

huntc commented Jul 16, 2020

I’m writing an export class for Reactive streams, as I did for OpenTracing and Dropwizard Metrics.

It occurs me that the export method should be returning a CompletionStage of a result instead of just a result. An export implementation is often performing some long lived action such as communicating over the network. I note that the Jaeger export function actually blocks.

What’s the appetite for making the export method non-blocking?

@jkwatson
Copy link
Contributor

Hi @huntc . I think our general approach is to leave this up to the exporter implementation. For example, the New Relic exporter returns immediately and handles all the processing on a background thread. If you want an async exporter, then my recommendation would be to make yours do the work in the background.

The jaeger exporter should probably be changed to do the same, I'd guess.

@huntc
Copy link
Contributor Author

huntc commented Jul 16, 2020

Thanks, @jkwatson. What's the effect of the returned status indicating failure?

If knowing whether the export is successful is important then I'd suggest that the method declaration should reflect async possibilities for implementations.

If it isn't important to know if the export is successful then the return type should be Void.

WDYT?

@jkwatson
Copy link
Contributor

We've had this discussion in the specs SIG. At the moment, there is no effect of returning the failure status. We made the method return an enum value in case in the future we want to support a robust retry-system in the SDK. We didn't want to potentially break existing exporters if we did decide to enhance the capabilities of the SDK in this way. Does that make sense?

@huntc
Copy link
Contributor Author

huntc commented Jul 16, 2020

Thanks, John.

Returning a status that yields no current benefit and is generally only attainable by blocking does not make sense to me.

If we may want to have a robust retry system in the future then I'd suggest that we need to use a completion stage now... blocking the thread isn't a great way to go and we know that exporters perform long-lived actions.

That said, returning a Void perhaps reflects the correct semantics. I agree that letting the exporters deal with retries in their own way makes sense. Different types of connectivity demand different retry policies e.g. if I'm using an exporter from a node known to have poor internet connectivity and low power requirements, sending on a best-effort basis is fine. If I'm using an exporter within a data center then retrying may make more sense. Then there are circuit-breaker considerations that apply to some environments, but not others...

Now feels the time to introduce API breaking changes before going GM.

Please feel free to point me at another forum if there's a wider conversation to be had. I would have thought it'd be language-specific though given the existence of coroutines etc. in other languages.

@jkwatson
Copy link
Contributor

I don't necessarily disagree with you, but we'd need to get the spec updated first, if we want to change it, since this is something that is specified across all languages: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#exportbatch

@huntc
Copy link
Contributor Author

huntc commented Jul 16, 2020 via email

@jkwatson
Copy link
Contributor

Returns: ExportResult:

ExportResult is one of:

Success - The batch has been successfully exported. For protocol exporters this typically means that the data is sent over the wire and delivered to the destination server.
Failure - exporting failed. The batch must be dropped. For example, this can happen when the batch contains bad data and cannot be serialized.

@huntc
Copy link
Contributor Author

huntc commented Jul 16, 2020 via email

@jkwatson
Copy link
Contributor

I don't disagree that this would be better, but I also don't see wiggle room in the spec, as currently written.
Also, unfortunately, CompletionStage is java 8+, so we'd need to choose one of the older async paradigms.

@huntc
Copy link
Contributor Author

huntc commented Jul 16, 2020 via email

@jkwatson
Copy link
Contributor

Astounding, indeed! We are supporting java 7 currently, so that grpc libraries can be instrumented with the APIs.

@jkwatson
Copy link
Contributor

The decision on the current exporter API to include a non-async return value predates my involvement in the project. I'll write a little spec tweak PR that will make me more comfortable with changing the exporter API in this case.

@huntc
Copy link
Contributor Author

huntc commented Jul 16, 2020

The decision on the current exporter API to include a non-async return value predates my involvement in the project. I'll write a little spec tweak PR that will make me more comfortable with changing the exporter API in this case.

Thanks John. There’s a precedent as mentioned given the Go and .NET implementations. Would you like me to raise a PR here to change the return type to a Future?

@jkwatson
Copy link
Contributor

open-telemetry/opentelemetry-specification#707

Yes, a PR would be very welcome!

@huntc
Copy link
Contributor Author

huntc commented Jul 16, 2020

I’ll get the PR done first thing. Meanwhile, in case you’re interested, here’s my work influencing the conversation. titanclass/reactive-streams-telemetry#6

@huntc
Copy link
Contributor Author

huntc commented Jul 17, 2020

The PR is in @jkwatson (took me longer than expected!): #1428

@huntc
Copy link
Contributor Author

huntc commented Aug 8, 2020

I think the label for this should indicate releasing before GA as it’ll have API implications.

@Oberon00
Copy link
Member

Oberon00 commented Aug 9, 2020

Should exports be async

My take on this is: It doesn't matter. Exporter + SpanProcessor are often a closely coupled unit at this point in practice. For example, an exporter that syncronously exports the batch of spans it gets (which is often the easiest to implement), cannot sensibly be used with a SimpleSpanProcessor because it would do a network requests for every single ended span on the critical path of the application. On the other hand, the BatchSpanProcessor already does all the boilerplate of setting up a background thread and batching spans together for the exporter.

There may be exporters where the threading model of the BatchSpanProcessor does not fit and neither does the SimpleSpanProcessor's. One choice you could make, is to do what the New Relic exporter does according to @jkwatson (#1422 (comment)): Use the SimpleSpan processor and then do threading/synchronicity and batching/streaming yourself. An equivalent approach would probably be to make a NewRelicSpanProcessor that does the export. Then you would not need to return a "fake" status code (but override an unused OnStart method instead).

IMHO the only reason to use the exporter interface is when you want to make use of the more complex SpanProcessor implementations to ease implementation of your exporter. If your exporter only works with SimpleSpanProcessor (which is really just an adapter from the SpanProcessor interface to the exporter interface), you'd better implement the SpanProcessor interface directly to avoid your exporter being accidentally used with a suboptimal SpanProcessor implementation.

In summary, I think exporter is just a convenience interface and is not really important. The SpanProcessor interface is the actually important one.

@Oberon00
Copy link
Member

Oberon00 commented Aug 9, 2020

One more point about async-ness: We currently struggle to find a use case for doing anything with the success/failure return value of the exporter. What is a span processor supposed to do with the "completed"/"still in progress" information that an "async" method would add? What code would it run in a completion callback?

@huntc
Copy link
Contributor Author

huntc commented Aug 9, 2020

Thanks for the feedback.

Having now spent almost a month pondering my original question and working on a PR, a couple of things stand out:

  1. Types are important and the Exporters types should reflect the semantics of the interface - no matter as to there being an implementation gain or not. Types are documentation and therefore assist the new user of these interfaces in understanding the intentions of a class.

  2. Blocking a thread should be avoided as it may consume resources unnecessarily. The use of work-stealing and other thread pool types, shared with other parts of a program, is crucial so that memory and other resources are used diligently. OTEL has a responsibility to have a sharp focus on resource usage being what should become a critical piece of infrastructure.

On the utility of having a result code, chaining exports can only be sensibly achieved by knowing the result of a previous export In a chain. Chaining exports is presently achievable.

I hope that this helps.

@carlosalberto
Copy link
Contributor

@huntc From the last Java meeting, we will be trying out your API (I will do a deep review myself later this week of your PR, but focusing on the code, not the idea/design itself ;) ).

The points that @Oberon00 raises are very interesting though. To me, also for Java it's not as important to have this new API as it may be for, say, Scala frameworks (who do a fair amount of asynchronous calls).

@Oberon00
Copy link
Member

Types are important and the Exporters types should reflect the semantics of the interface

That's true. One thing to note is that when we change the API to be async, we have to consider how much sense the BatchSpanProcessor implementation still makes. If exports are async, then probably it should not have a background thread but invoke the exporter directly in OnEnd once a batch is full/the timeout is expired (or do we still need a background thread for the case where no new span is started/ended for quite some time?).

@jkwatson
Copy link
Contributor

@Oberon00 : Your point about SpanProcessors being the more important construct is very interesting. Maybe what we really need is an alternative async-friendly SpanProcessor, if the disruptor version isn't working.

@huntc would it make sense to have a scala-friendly SpanProcessor implementation, maybe even written in scala, that would work with a new scala-friendly exporter interface? This seems like it might be a better long-term solution than trying to shoe-horn a scala-friendly exporter into the existing java-oriented SpanProcessor model. Thoughts?

@trask
Copy link
Member

trask commented Aug 10, 2020

If exports are async, then probably it should not have a background thread but invoke the exporter directly in OnEnd once a batch is full/the timeout is expired

Even with an async exporter, I don't think you'd want to invoke it on the application execution thread. I think the idea is for the async exporter to do marshalling on the same thread, and then call an async http/gRPC framework, at which point it would return to the caller instead of waiting for synchronous http/gRPC.

@huntc
Copy link
Contributor Author

huntc commented Aug 10, 2020

To me, also for Java it's not as important to have this new API as it may be for, say, Scala frameworks (who do a fair amount of asynchronous calls).

I do not see the motivation for my original question being influenced by the language being used, be it Scala, Java, Kotlin, Clojure etc. My take is that the Java OTEL project represents the JVM as a whole. Java is the common denominator.

Async non-blocking code is as important in Java-land as anywhere else.

@huntc
Copy link
Contributor Author

huntc commented Aug 10, 2020

If exports are async, then probably it should not have a background thread but invoke the exporter directly in OnEnd once a batch is full/the timeout is expired (or do we still need a background thread for the case where no new span is started/ended for quite some time?).

Yes, the latter.

I think the batch span processor makes sense depending on how the exporter meets one’s use case. In my case, the exporter I’m writing will encourage the use of the simple span processor. It’s objective is to stream data to a collector as fast as it can and, in the case of back pressure, will drop older telemetry.

@huntc
Copy link
Contributor Author

huntc commented Aug 10, 2020

@huntc would it make sense to have a scala-friendly SpanProcessor implementation, maybe even written in scala, that would work with a new scala-friendly exporter interface? This seems like it might be a better long-term solution than trying to shoe-horn a scala-friendly exporter into the existing java-oriented SpanProcessor model. Thoughts?

I see no distinction here between Java and Scala.

@jkwatson
Copy link
Contributor

Leaving the Java/Scala differences aside for the moment, what do you think of @Oberon00 's core thesis that perhaps a new SpanProcessor that better supports async models, with an more natively async contract with exporters would be a alternative approach to modifying the existing non-async-friendly BatchSpanProcessor?

@huntc
Copy link
Contributor Author

huntc commented Aug 10, 2020

...what do you think of @Oberon00 's core thesis that perhaps a new SpanProcessor that better supports async models, with an more natively async contract with exporters would be a alternative approach to modifying the existing non-async-friendly BatchSpanProcessor?

Firstly, I don't see that another type of span processor would change the contract of the exporter i.e. the exporter's types should still convey the behaviour expected.

Secondly, adding another type of span processor will perhaps just increase confusion. If anything, I'd stick with the simple span processor and do away with the batch one altogether. This then places more emphasis on the exporters taking responsibility for back-pressure, which I think is reasonable.

In summary, I feel that the user of these APIs should only concern themselves with which exporter to use. Exporters are easy to comprehend, whereas "span processor" is a bit of a foreign concept. Choosing both a span processor and an exporter seems a burden on the user.

@jkwatson
Copy link
Contributor

Cool. Just wanted to think through all the options. Thanks for humoring me.

@anuraaga
Copy link
Contributor

For a bit of color, I would say here Exporter is the low level primitive, and span processor is a higher level abstraction. The higher level generally can't "do more" than the primitives it has available. So the span processor, if wanting to go async, has to workaround by e.g. delegating to a disruptor thread (even if the exporter has its own event loop, it's a situation that makes me cry a little). The opposite wouldn't be true, a span processor can present synchronously if the exporter is async by just waiting on the result - we didn't add .join here but if anyone thinks this is important it's trivial to add.

Avoiding exporter interface when wanting async is an option, but then why do we even have exporters? The first thing I'd want to do is to stop implementing Exporter for the otlp ones since it makes sense to issue the gRPC calls asynchronously to reduce thread parking.

@Oberon00
Copy link
Member

Oberon00 commented Aug 11, 2020

@huntc

Firstly, I don't see that another type of span processor would change the contract of the exporter

We can have different exporter interfaces for use with different SpanProcessors. E.g. async-friendly SpanProcessors could use an AsyncExporter interface while others use the old exporter interface.

Secondly, adding another type of span processor will perhaps just increase confusion. If anything, I'd stick with the simple span processor and do away with the batch one altogether.

That's a slippery slope to removing the exporter/span processor distinction altogether. Especially if we are thinking about chained exports / chained span processors.

@anuraaga

Avoiding exporter interface when wanting async is an option, but then why do we even have exporters?

Yeah, I don't think this is currently very clearly motivated, especially if you only use the SimpleSpanProcessor as opposed to one that offers a different threading model to your exporter (like BatchSpanProcessor).

@Oberon00
Copy link
Member

Oberon00 commented Aug 11, 2020

So the span processor, if wanting to go async, has to workaround by e.g. delegating to a disruptor thread (even if the exporter has its own event loop, it's a situation that makes me cry a little).

I don't see this as a workaround, it's the value that it provides to the exporter, since it doesn't have to have an event loop/background thread itself. Of course this is bad if the exporter needs a different threading model, but would an async interface help here? You would still need to select your SpanProcessor implementation depending on whether the exporter is actually async or not.

@trask
Copy link
Member

trask commented Aug 11, 2020

Cherry picking from above:

I feel that the user of these APIs should only concern themselves with which exporter to use. Exporters are easy to comprehend, whereas "span processor" is a bit of a foreign concept. Choosing both a span processor and an exporter seems a burden on the user.

You would still need to select your SpanProcessor implementation depending on whether the exporter is actually async or not.

Is there some way we can make Simple/Disruptor/BatchSpanProcessor an internal implementation detail of exporters? And make SpanProcessor all about enhancing/filtering spans?

@huntc
Copy link
Contributor Author

huntc commented Aug 11, 2020

Is there some way we can make Simple/Disruptor/BatchSpanProcessor an internal implementation detail of exporters? And make SpanProcessor all about enhancing/filtering spans?

This is exactly what I was alluding to above:

In summary, I feel that the user of these APIs should only concern themselves with which exporter to use. Exporters are easy to comprehend, whereas "span processor" is a bit of a foreign concept. Choosing both a span processor and an exporter seems a burden on the user.

Abolish span processors.

However, we are getting off track with this particular issue. My original question was about the appetite for making the exporter’s types reflect the nature of the operations they perform. This stands. It’d be great to close out this issue with a yay or nay wrt the exporter’s types. Let’s open another issue to discuss the relevance of span processors.

@jkwatson
Copy link
Contributor

Closing this, as the PR to make exporters async has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants