Skip to content

Tail sampling #1768

Open
Open
@jwilm

Description

Feature Request

Better support tail sampling based on computed properties such as HTTP response codes or other things known only later in request lifecycles (vs upfront like level filter).

Motivation

High volume systems might want to drop uninteresting collections of spans to reduce load on other systems. This is already possible to some extent by using filtering to do a probabilistic sampling up front or via level filter to reduce the number of spans for a trace, but for looking at things like HTTP response codes and dropping uninteresting things like nominal 2xx responses, there doesn't seem to be much support at all.

My particular use case is actually for a Kafka consumer which is uninterested in a large fraction of messages on the topic.

Proposal

The way I've been thinking about this problem is that a Subscriber/Collector implementation could buffer spans, events, etc. before handing off to a new Export / Report / type to actually report on the trace. In some ways, this is not too dissimilar from what the tracing-opentelemetry::Layer impl is already doing -- data is buffered in the Layer until the span is closed at which point the opentelemetry builder is finalized and the span is started/dropped and then handed off to the opentelemetry library to export.

This could be achieved without any additional traits by implementing a Layer as described and chaining it with other reporting layers. However, that is merely convention rather than a hard abstraction. Based on my current understanding of the code, it seems to me the best approach would be to separate out trace collection from trace exporting more formally.

The benefit of an extra Report trait that operates on groups of spans is that it makes it easy to implement an arbitrary tail sampling logic between eg the collection step and the opentelemetry reporter.

Alternatives

  • Don't support this in the tracing library directly. As mentioned, it's possible to implement, although not in a user friendly way. It took quite a bit of reading the tracing library code to build enough understanding to provide a proposal (and I'm still not particularly confident I haven't missed something obvious).
  • Support this but without adding additional traits. A "buffering Collect" implementation could be provided where users could then chain other layers to. This doesn't compose particularly well with the provided tracing-opentelemetry::Layer, however.

Closing notes

I actually came here to ask a question about whether or not this was possible and I was just missing something, but it wasn't one of the options in the issue templates 🙃. I had done enough code reading and thinking about the problem however to make a proposal, so here it is. If this is something the project is interested in, I believe I could provide an implementation, given a bit of guidance.

Thanks!

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions