Specify Span ID creation with sampling. #998

Oberon00 · 2020-09-24T15:26:03Z

Partial fix for #864 (remainder would require an API change to not pass a trace ID to the sampler if there is none).

Changes

Clarifies how span IDs are generated for DROP and RECORD_ONLY.

Oberon00 · 2020-09-24T16:24:48Z

specification/trace/sdk.md

+   note: this must be done before calling `ShouldSample`, because it expects
+   a valid trace Id as input).
+2. Query the `Sampler`'s [`ShouldSample`](#shouldsample) method.
+3. If the decision is `DROP`, use the parent span ID or all-zero if there is none.


That means: If there is no parent and the decision is drop, we will have a span with the new trace ID but an invalid span ID. The SpanContext as a whole is thus invalid. We could also specify that the generated trace ID should be discarded in that case.

Actually this may be worth re-considering: Generating trace+span ID for an unsampled root has implications if the span is injected. If we generate an invalid spancontext (with zero span ID), we will inject nothing as we have an invalid traceparent header as per W3C (cf. also #753). This causes downstream components to potentially create traces / run sampling logic that they would not if they got an incoming unsampled traceparent header.

We could also specify that the generated trace ID should be discarded in that case.

I'd be fine with this.

we will inject nothing as we have an invalid traceparent header as per W3C (cf. also #753). This causes downstream components to potentially create traces

Could you elaborate?

@carlosalberto We never inject an invalid SpanContext, at least using W3C. If we would generate a new ID instead of using zero, we would inject a valid spancontext with unset sampled flag. This will likely cause the receiver to suppress the trace. On the other hand, if we use zero, the downstream component will not receive any tracestate and will likely start a new trace.

Now that I write, this, I realize that this even applies to local child spans: Normally if the root span is not sampled, local child spans would be unsampled too, but with this they would become root spans.

This is bad. I think we will have to create Span and Trace ID even for "dropped" root spans after all. It seems there is a dependency of #864 "Option to allow "default" IDs for unsampled traces" on #753 "Current "Invalid" SpanContext definition precludes TraceState-only SpanContext". We would have to have some sentinel SpanContext that is "valid, unsampled, but without any IDs". That's out of scope here of course.

Updated in 4006816. Maybe this demonstrates how easy it is to mess up here and that it is thus important to have this specified clearly? 😃

Oberon00 · 2020-09-24T16:26:50Z

specification/trace/sdk.md

+   a valid trace Id as input).
+2. Query the `Sampler`'s [`ShouldSample`](#shouldsample) method.
+3. If the decision is `DROP`, use the parent span ID or all-zero if there is none.
+   Otherwise (if the decision is not `DROP`) a new span ID MUST be generated.


That means: RECORD_ONLY spans get a new span ID, always. I don't know what the purpose of RECORD_ONLY is (should we remove it?), but having different spans with the same trace+span ID seems worse than breaking the trace in case any children of the RECORD_ONLY span become RECORD_AND_SAMPLE .

andrewhsu · 2020-09-25T16:29:12Z

from the issue triage mtg today, labelling this as after-ga since the corresponding i up.ssue is after-ga.

however, if this is just a editorial change that does not affect the trace freeze, please speak

Oberon00 · 2020-09-25T16:35:02Z

@andrewhsu As stated in the issue comment, this PR is exactly the part of the issue that is not doable after-ga.

tigrannajaryan · 2020-09-25T18:08:37Z

@Oberon00 will you be able get the approvals today? Other options are not doing this in 1.0 at all or asking for an exception from spec freeze.

Oberon00 · 2020-09-25T18:54:49Z

@tigrannajaryan I won't. In that case, it seems we will have to look around after GA and document all the paths taken by any SDKs as allowed and maybe introduce some options.

github-actions · 2020-10-03T03:23:39Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Oberon00 · 2020-10-06T09:09:44Z

I changed the corresponding issue to a new issue #1060, so we can re-evaluate what to do here for GA.

Oberon00 · 2020-10-06T16:15:06Z

Relabeled to allowed-for-ga to match #1060.

Co-authored-by: Armin Ruech <armin.ruech@dynatrace.com>

Conflicts: spec-compliance-matrix.md

specification/trace/sdk.md

jmacd · 2020-10-28T05:51:08Z

specification/trace/sdk.md

+3. If the decision is `DROP` and there is a valid parent span ID, use it.
+   Otherwise (if the decision is not `DROP` or there was no valid parent span ID)
+   a new span ID MUST be generated.


Yes. This is tricky, but I like @Oberon00's proposal that DROP sampling decisions at the root of a trace should generate a new trace ID and new span ID without the sampled bit set. Am I understanding?

carlosalberto · 2020-10-28T16:30:47Z

@open-telemetry/specs-approvers Please review this one. We bumped its related issue as "Required-for-GA" so we definitely needs reviews ;)

jkwatson · 2020-10-28T20:54:41Z

specification/trace/sdk.md

+   (note: this must be done before calling `ShouldSample`, because it expects
+   a valid trace ID as input).
+2. Query the `Sampler`'s [`ShouldSample`](#shouldsample) method.
+3. If the decision is `DROP` and there is a valid parent span ID, reuse it as the new `Span`'s span ID.


Doesn't this mean two spans with the same id? I can imagine this could become extremely confusing to someone trying to debug things. As a user, I think I'd be super confused if I saw a new Span created with the same ID as the parent, even if that span might not end up going anywhere.

It's two spans with the same ID the same way as calling extract and then inject creates a copy of the injected span (with the same ID). It is more of a hollow shell of a span than an actual span.

even if that span might not end up going anywhere

This span cannot end up going anywhere. A dropped span is always non-recording, so there is nothing that could end up anywhere beside the span+trace ID itself (not even a parent span ID). What is more, SpanProcessors are not invoked with dropped spans.

RECORD_ONLY spans are a different story, and they do get a new ID, see #998 (comment)

All that being said, I think I could agree that this may make the mental model of sampling a bit more complicated than just always creating a new ID. But I hope that the benefits, namely ability to resume sampling in a child and having the closes sampled span as parent, and a slight(?) performance improvement would be worth it.

Personally, I think I'd prefer attacking this problem via a smarter SpanContext that allowed propagation of the relevant bits, rather than generating weird internal spans for the sake of propagating the SpanContext.

Or, better yet, move the TraceState and TraceFlags out of the span altogether and propagate them separately in the Context, rather than hanging them off a span.

What should startSpan return for a dropped span then?

regeneration looks like a safer choice that will not break SDK going forward

I think changing from any of them to the other will break some uses.

and is easier to explain

Fair enough.

the idea is that instrumented library needs a reliable unique ID of the operation

If you have an ID that is not associated with any trace on the tracing backend, will that be more helpful than the "inexact" ID of the parent operation? Note, you will have a valid span ID in any case.

I would say we go GA with this, and discuss how to improve on it using things like adding a GetUniqueOperationId function to Span that returns the span ID if it was generated for that span or generates a new ID that is cached and propagated to children if it is not. I recon propagating this ID cross-process may be difficult, but the same goes for propagating the ID of the last sampled span, and it seems unlikely that you call GetUniqueOperationId on an extracted span directly (it would not be very unique, since the parent's process would have the same ID if it was sampled at least).

regeneration looks like a safer choice that will not break SDK going forward

I think changing from any of them to the other will break some uses.

and is easier to explain

Fair enough.

the idea is that instrumented library needs a reliable unique ID of the operation

If you have an ID that is not associated with any trace on the tracing backend, will that be more helpful than the "inexact" ID of the parent operation? Note, you will have a valid span ID in any case.

I would say we go GA with this, and discuss how to improve on it using things like adding a GetUniqueOperationId function to Span that returns the span ID if it was generated for that span or generates a new ID that is cached and propagated to children if it is not. I recon propagating this ID cross-process may be difficult, but the same goes for propagating the ID of the last sampled span, and it seems unlikely that you call GetUniqueOperationId on an extracted span directly (it would not be very unique, since the parent's process would have the same ID if it was sampled at least).

The assumption that there is a trace backend may not be true for everybody. And having trace backend of your caller having a record of a parent span Id is not helpful to solve your problem. So the decision we made at @open-telemetry/technical-committee is to go with span ID regeneration. This creates a nice guarantee of local uniqueness of a spanID that various telemetry signals may take a dependency on.

How strongly do you disagree?

Not very strongly. I do think my proposed solution is the better trade-off but I concede that always generating a new ID is also a reasonable option.

@Oberon00 Thanks for the answer. I think we will go with the TC decision for now then - if you have time, would you mind updating this PR to match the proposal? Else, I can help by creating a spin-off of this PR. Let me know ;)

I think it would be best to create a spin-off PR. That way we clearly document that this option was rejected.

specification/trace/sdk.md

carlosalberto · 2020-11-11T01:15:14Z

Ping @yurishkuro

bogdandrutu · 2020-11-11T16:29:01Z

@SergeyKanzhelev you mentioned during the specification sig that you will document the decision that was discussed during the TC meeting.

bogdandrutu · 2020-11-11T16:29:16Z

@Oberon00 please rebase.

SergeyKanzhelev

@SergeyKanzhelev you mentioned during the specification sig that you will document the decision that was discussed during the TC meeting.

sorry for delay. This comment summarizes it: #998 (comment) but I was thinking to expand a bit.

Specify Span ID creation with sampling.

77bbb73

Oberon00 requested review from a team September 24, 2020 15:26

github-actions bot assigned carlosalberto Sep 24, 2020

Oberon00 added 3 commits September 24, 2020 17:28

Add CHANGELOG.

48b76e4

Merge branch 'master' into sdk-spanid-sampling

db30eaf

Mismerge/lint.

57a4e25

Oberon00 mentioned this pull request Sep 24, 2020

Option to allow "default" IDs for unsampled traces #864

Open

Oberon00 added area:sampling Related to trace sampling area:sdk Related to the SDK spec:trace Related to the specification/trace directory labels Sep 24, 2020

Period.

c59c207

Oberon00 commented Sep 24, 2020

View reviewed changes

andrewhsu added the release:after-ga Not required before GA release, and not going to work on before GA label Sep 25, 2020

Merge branch 'master' into sdk-spanid-sampling

e31741a

github-actions bot added the Stale label Oct 3, 2020

Oberon00 mentioned this pull request Oct 6, 2020

Missing spec for Span ID creation for dropped/record-only spans #1060

Closed

Merge branch 'master' into sdk-spanid-sampling

b6afd59

Oberon00 added release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs and removed release:after-ga Not required before GA release, and not going to work on before GA labels Oct 6, 2020

bogdandrutu previously approved these changes Oct 6, 2020

View reviewed changes

Oberon00 and others added 4 commits October 22, 2020 13:29

Update specification/trace/sdk.md

08b15ed

Co-authored-by: Armin Ruech <armin.ruech@dynatrace.com>

Add compliance matrix entry.

bfd652f

Merge remote-tracking branch 'upstream/master' into sdk-spanid-sampling

83e0410

Conflicts: spec-compliance-matrix.md

Fix compliance matrix.

6f04106

andrewhsu removed Stale release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs labels Oct 27, 2020

jmacd approved these changes Oct 28, 2020

View reviewed changes

Oberon00 added 2 commits October 28, 2020 10:29

Apply suggestions from code review

34263a3

Fix dead internal link.

9933136

yurishkuro approved these changes Oct 28, 2020

View reviewed changes

carlosalberto approved these changes Oct 28, 2020

View reviewed changes

jkwatson reviewed Oct 28, 2020

View reviewed changes

Merge branch 'master' into sdk-spanid-sampling

bc9ef43

yurishkuro reviewed Nov 3, 2020

View reviewed changes

specification/trace/sdk.md Show resolved Hide resolved

yurishkuro reviewed Nov 3, 2020

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

Oberon00 added 3 commits November 3, 2020 22:29

Add forward-ref to ParentBased sampler.

eb989b7

Merge branch 'master' into sdk-spanid-sampling

17a58fb

Typo

9c9d652

yurishkuro mentioned this pull request Nov 3, 2020

Support restarting the trace with a different trace ID #1188

Open

bogdandrutu added the release:required-for-ga Must be resolved before GA release, or nice to have before GA label Nov 10, 2020

Merge branch 'master' into sdk-spanid-sampling

24b6032

SergeyKanzhelev suggested changes Nov 11, 2020

View reviewed changes

carlosalberto mentioned this pull request Nov 13, 2020

Specify Span ID creation with sampling (non-recording spans included) #1225

Merged

carlosalberto closed this in #1225 Nov 16, 2020

arminru deleted the sdk-spanid-sampling branch November 16, 2020 17:48

Oberon00 mentioned this pull request May 19, 2022

A span with a new span ID MUST be created even for a sampling decision of DROP. open-telemetry/opentelemetry-dotnet#3290

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify Span ID creation with sampling. #998

Specify Span ID creation with sampling. #998

Oberon00 commented Sep 24, 2020 •

edited

Loading

Oberon00 Sep 24, 2020

Oberon00 Oct 6, 2020

carlosalberto Oct 7, 2020

Oberon00 Oct 12, 2020 •

edited

Loading

Oberon00 Oct 12, 2020

Oberon00 Sep 24, 2020

andrewhsu commented Sep 25, 2020

Oberon00 commented Sep 25, 2020 •

edited

Loading

tigrannajaryan commented Sep 25, 2020

Oberon00 commented Sep 25, 2020

github-actions bot commented Oct 3, 2020

Oberon00 commented Oct 6, 2020

Oberon00 commented Oct 6, 2020

jmacd Oct 28, 2020

carlosalberto commented Oct 28, 2020

jkwatson Oct 28, 2020 •

edited

Loading

Oberon00 Oct 28, 2020

jkwatson Oct 28, 2020

jkwatson Oct 28, 2020

Oberon00 Oct 28, 2020

Oberon00 Nov 11, 2020

SergeyKanzhelev Nov 11, 2020

Oberon00 Nov 11, 2020

carlosalberto Nov 12, 2020

Oberon00 Nov 12, 2020

carlosalberto commented Nov 11, 2020

bogdandrutu commented Nov 11, 2020

bogdandrutu commented Nov 11, 2020

SergeyKanzhelev left a comment

Specify Span ID creation with sampling. #998

Specify Span ID creation with sampling. #998

Conversation

Oberon00 commented Sep 24, 2020 • edited Loading

Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 Oct 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewhsu commented Sep 25, 2020

Oberon00 commented Sep 25, 2020 • edited Loading

tigrannajaryan commented Sep 25, 2020

Oberon00 commented Sep 25, 2020

github-actions bot commented Oct 3, 2020

Oberon00 commented Oct 6, 2020

Oberon00 commented Oct 6, 2020

Choose a reason for hiding this comment

carlosalberto commented Oct 28, 2020

jkwatson Oct 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlosalberto commented Nov 11, 2020

bogdandrutu commented Nov 11, 2020

bogdandrutu commented Nov 11, 2020

SergeyKanzhelev left a comment

Choose a reason for hiding this comment

Oberon00 commented Sep 24, 2020 •

edited

Loading

Oberon00 Oct 12, 2020 •

edited

Loading

Oberon00 commented Sep 25, 2020 •

edited

Loading

jkwatson Oct 28, 2020 •

edited

Loading