Description
Feature Request
Better support tail sampling based on computed properties such as HTTP response codes or other things known only later in request lifecycles (vs upfront like level filter).
Motivation
High volume systems might want to drop uninteresting collections of spans to reduce load on other systems. This is already possible to some extent by using filtering to do a probabilistic sampling up front or via level filter to reduce the number of spans for a trace, but for looking at things like HTTP response codes and dropping uninteresting things like nominal 2xx responses, there doesn't seem to be much support at all.
My particular use case is actually for a Kafka consumer which is uninterested in a large fraction of messages on the topic.
Proposal
The way I've been thinking about this problem is that a Subscriber/Collector implementation could buffer spans, events, etc. before handing off to a new Export
/ Report
/ type to actually report on the trace. In some ways, this is not too dissimilar from what the tracing-opentelemetry::Layer
impl is already doing -- data is buffered in the Layer
until the span is closed at which point the opentelemetry builder is finalized and the span is started/dropped and then handed off to the opentelemetry library to export.
This could be achieved without any additional traits by implementing a Layer as described and chaining it with other reporting layers. However, that is merely convention rather than a hard abstraction. Based on my current understanding of the code, it seems to me the best approach would be to separate out trace collection from trace exporting more formally.
The benefit of an extra Report
trait that operates on groups of spans is that it makes it easy to implement an arbitrary tail sampling logic between eg the collection step and the opentelemetry reporter.
Alternatives
- Don't support this in the tracing library directly. As mentioned, it's possible to implement, although not in a user friendly way. It took quite a bit of reading the tracing library code to build enough understanding to provide a proposal (and I'm still not particularly confident I haven't missed something obvious).
- Support this but without adding additional traits. A "buffering Collect" implementation could be provided where users could then chain other layers to. This doesn't compose particularly well with the provided
tracing-opentelemetry::Layer
, however.
Closing notes
I actually came here to ask a question about whether or not this was possible and I was just missing something, but it wasn't one of the options in the issue templates 🙃. I had done enough code reading and thinking about the problem however to make a proposal, so here it is. If this is something the project is interested in, I believe I could provide an implementation, given a bit of guidance.
Thanks!