Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tail sampling processor: add a way to sample all spans that have a span link to a sampled span. #33568

Open
isobelormiston opened this issue Jun 14, 2024 · 10 comments
Labels
enhancement New feature or request processor/tailsampling Tail sampling processor Stale

Comments

@isobelormiston
Copy link

isobelormiston commented Jun 14, 2024

Component(s)

processor/tailsampling

Is your feature request related to a problem? Please describe.

I often have spans X and Y that are part of different traces but are linked via a span link. There appears to be no way to configure the tail sampling processor such that whenever one of these spans is sampled, the other is also sampled.

Describe the solution you'd like

Suppose I have two spans, X and Y, that are part of different traces but are linked via a span link. I want to be able to set up the tail sampling processor such that whenever X is sampled, Y is also sampled, as is the whole trace that Y belongs to.
e.g.

processors:
  tail_sampling:
    sample_linked_spans: true
    ...

A (possibly simpler) alternative to achieve this: we can update our code to give the two spans the same value of an attribute some-new-attribute. So, we will be able to achieve what we need if the tail sampling processor can be updated so that we can specify an attribute, and if X is sampled, any spans that share that attribute are also sampled (along with their whole trace).
e.g.

processors:
  tail_sampling:
    policies: [{
        name: some-new-attribute-policy
        type: shared_attribute
        attribute: some-new-attribute
    },...

Describe alternatives you've considered

No response

Additional context

open-telemetry/opentelemetry-specification#2918 is a similar problem for head sampling

It appears that the probabilistic sampler can be configured to sample based on some other attribute other than the trace ID. This is similar to what we might need: as well as the processor collecting together all spans that share a trace ID, and sampling all these, it would collect together all spans that share a custom attribute, and sample all these.

@isobelormiston isobelormiston added enhancement New feature or request needs triage New item requiring triage labels Jun 14, 2024
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Jun 14, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@isobelormiston
Copy link
Author

Update: I asked this on the Cloud Native Computing Foundation slack channel. The main problem with this is that linked traces won't necessarily be processed by the same collector instance. So this would likely require an update to both the load balancing processor and the tail sampling processor.

@jpkrohling
Copy link
Member

@isobelormiston, I believe the caching feature that @jamesmoessis just built for the tail-sampling would work well for that. I agree that further changes to the load-balancing would also be needed. If you are familiar with that component, would you be able to create a proposal for it?

@jamesmoessis
Copy link
Contributor

This is an interesting and tricky problem. I'm unsure of the best way to solve this.

Since only one span in the trace knows about the span link, it seems infeasible to route the span to the same node that the linked trace would be routed to. It also doesn't consider that the linked trace could have also been linked to yet another trace. So, I think we need to keep strictly sharding by trace ID.

The only feasible way I could think of is a cache (e.g. Redis) which holds the span link information and sampling decisions. This could struggle if there were a lot of linked spans though.

@isobelormiston I think if you have the capability, the best solution would be to make sure that sampling.priority is set on your linked spans in uniform, and then look for that attribute on the tail sampler using numeric_attribtue policy (0 == not sampled, > 0 == sample). Similar to how you described.

I think it's the easiest and most reasonable course for action if you have access to the code / instrumentation to add those attributes.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@Darkemon
Copy link

I wonder, if TraceIdRatioBasedSampler on client and collector sides behaves identically and is deterministic - the same result for the same id with specific ratio. Then I could predict if a span is going to be sampled by collector and add sampling.priority: 1 to the linked spans.

@github-actions github-actions bot removed the Stale label Aug 31, 2024
@jpkrohling
Copy link
Member

I believe @jmacd was working on having them compatible among themselves, but I'm not 100% what's the current state.

@Darkemon
Copy link

Darkemon commented Sep 3, 2024

@jmacd could you confirm the sampler works this way?

At least golang implementation of the sampler is deterministic https://github.com/open-telemetry/opentelemetry-go/blob/932a4d8a5f2536645618d7aee8e5da6b8e3b6751/sdk/trace/sampling.go#L71C1-L84C2

@isobelormiston
Copy link
Author

That sounds reasonable, but in our particular case we're not sampling with the trace ID sampling policy. We're using the error-policy and latency-policy, so unfortunately there's no way we'll be able to predict in advance whether a span will be sampled.

@jamesmoessis on using the numeric_attribute policy, and making sure we've an attribute set consistently across all linked spans: the difficulty here is again that we don't know in advance whether we're going to want to sample span X, so we wouldn't know what to set the sampling.priority to ahead of time for the two linked spans.

Since only one span in the trace knows about the span link, it seems infeasible to route the span to the same node that the linked trace would be routed to.

This is a good point, I think this would make it very difficult to implement the required changes to the load balancer exporter.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/tailsampling Tail sampling processor Stale
Projects
None yet
Development

No branches or pull requests

5 participants