Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
0524a3d
OpenTelemetry trace SDK requirements for probability sampling followi…
jmacd Jul 26, 2024
c5453f8
linebreaks
jmacd Jul 30, 2024
25a61fd
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Aug 7, 2024
68fa270
Add a migration section
jmacd Aug 7, 2024
51f9794
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Aug 15, 2024
ba5a47b
lowercase hex
jmacd Aug 15, 2024
49673b7
spec-compliance-matrix.md
jmacd Aug 15, 2024
e51bea6
merge w/ removed file
jmacd Aug 15, 2024
4afe1c7
chlog
jmacd Aug 15, 2024
2f0dc0b
reverse inequality
jmacd Aug 29, 2024
f333b71
Apply suggestions from code review
jmacd Aug 29, 2024
b7376bd
remove sci-note and reverse region
jmacd Aug 29, 2024
483b3fa
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Aug 29, 2024
c40de50
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Sep 12, 2024
15a9c6f
spec-compliance: AlwaysOn too
jmacd Sep 12, 2024
672fac2
edits for jpkrohling
jmacd Sep 25, 2024
3c80d97
Apply suggestions from code review
jmacd Sep 25, 2024
b2b37f7
Merge branch 'jmacd/otep235' of github.com:jmacd/opentelemetry-specif…
jmacd Sep 25, 2024
1bb0b31
algorithm
jmacd Sep 27, 2024
2f0e387
move a sentence; drop a paragraph
jmacd Sep 27, 2024
6e29b0e
more overview
jmacd Oct 4, 2024
77b51f8
nuance
jmacd Oct 4, 2024
a61fbdd
Update specification/trace/tracestate-probability-sampling.md
jmacd Oct 4, 2024
59c329d
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Oct 10, 2024
d21f341
Merge branch 'jmacd/otep235' of github.com:jmacd/opentelemetry-specif…
jmacd Oct 10, 2024
4e05267
Apply suggestions from code review
jmacd Oct 16, 2024
d65ea09
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Oct 21, 2024
92876f9
Use consistent terminology with 4162, e.g., OpenTelemetry TraceState …
jmacd Oct 21, 2024
1855839
Specify a compatibility warning for transition
jmacd Oct 21, 2024
44c8190
asymmetrical
jmacd Oct 21, 2024
66d190f
TOC
jmacd Oct 21, 2024
0aacc19
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Oct 30, 2024
e6dc409
AlwaysOn should respect sampling threshold
jmacd Oct 30, 2024
c75a010
Revert "AlwaysOn should respect sampling threshold"
jmacd Nov 1, 2024
87fb314
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Nov 13, 2024
f3693fc
do not change AlwaysOnSampler spec
jmacd Nov 13, 2024
b0a840a
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Feb 4, 2025
dd33a6f
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Feb 12, 2025
194552b
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Feb 25, 2025
b50a6bd
remove 'one-time'
jmacd Feb 25, 2025
1349e4f
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Feb 25, 2025
af635a6
lint
jmacd Feb 25, 2025
d486784
lint
jmacd Feb 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
move a sentence; drop a paragraph
  • Loading branch information
jmacd committed Sep 27, 2024
commit 2f0e387816330b9f3adb1be2ef44f0f71b679fc8
2 changes: 1 addition & 1 deletion specification/trace/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,7 +401,7 @@ For respecting the parent `SampledFlag`, see the `ParentBased` sampler specified
##### `TraceIdRatioBased` sampler configuration

The `TraceIdRatioBased` sampler is typically configured using a 32-bit or 64-bit floating point number to express the sampling ratio.
The minimum valid sampling ratio is `2**-56`, and the maximum valid sampling ratio is 1.0.
The minimum valid sampling ratio is `2^-56`, and the maximum valid sampling ratio is 1.0.
From an input sampling ratio, a rejection threshold value is calculated; see [consistent-probability sampler requirements][CONSISTENTSAMPLING] for details on converting sampling ratios into thresholds with variable precision.

[CONSISTENTSAMPLING]: ./tracestate-probability-sampling.md
Expand Down
2 changes: 1 addition & 1 deletion specification/trace/tracestate-handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,6 @@ For example, the following TraceState value identifies a trace with 100% samplin
tracestate: ot=th:0
```

In sampling, the term _adjusted count_ refers to the effective number of items represented by a sampled item of telemetry.
To calculate sampling probability from the rejection threshold, define a constant `MaxAdjustedCount` equal to 2^56, the number of distinct 56-bit values.
The sampling probability is defined:

Expand All @@ -139,6 +138,7 @@ Threshold can be calculated from Probability:
Threshold = MaxAdjustedCount * (1 - Probability)
```

In sampling, the term _adjusted count_ refers to the effective number of items represented by a sampled item of telemetry.
The adjusted count of a span is the inverse of its sampling probability and can be derived from the threshold as follows.

```
Expand Down
28 changes: 11 additions & 17 deletions specification/trace/tracestate-probability-sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ For example, if the sampling probability is 100% (keep all spans), the rejection

Similarly, if the sampling probability is 1% (drop 99% of spans), the rejection threshold with 5 digits of precision would be (1-0.01) * 2^56 = 4458562600304640 = 0xfd70a00000000.

We refer to this rejection threshold conceptually as `T`. We represent it using the key `th`. This must be propagated in both the `tracestate` header and in the TraceState attribute of each span. In the example above, the `th` key has `fd70a00000000` as the value.
We refer to this rejection threshold conceptually as `T`. We represent it using the OpenTelemetry TraceState key `th`, where the value is propagated and also stored with each span. In the example above, the `th` key has `fd70a00000000` as the value.

See [tracestate handling](./tracestate-handling.md#sampling-threshold-value-th) for details about encoding threshold values.

Expand Down Expand Up @@ -113,19 +113,10 @@ This section defines the behavior for these two categories of samplers.

A head sampler is responsible for computing the `rv` and `th` values in a new span's initial [`TraceState`](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.29.0/specification/trace/api.md#tracestate). The main inputs to that computation include the parent span's trace state (if a parent span exists), the new span's trace ID, and possibly the trace flags (to know if the trace ID has been generated in a random manner).

First, a consistent probability `Sampler` may choose its own sampling rate. The higher the chosen sampling rate, the lower the rejection threshold (T). It MAY select any value of T. If a valid `SpanContext` is provided in the call to `ShouldSample` (indicating that the span being created will be a child span), there are two possibilities:
When a span is sampled by in accordance with this specification, the output TraceState SHOULD be set to convey probability sampling:

- **The child span chooses a T greater than the parent span's T**: The parent span may be *kept* but it is possible that its child, the current span, may be dropped because of the lower sampling rate. At the same time, in the case where the decision for the child span is to *keep* it, the decision for the parent span would have also been to *keep* (due to our consistent sampling approach) since the parent's sampling rate is greater than the child's sampling rate.
- **The child span chooses a T less than or equal to the parent span's T**: The parent span might have been *dropped* but it is possible that its child, the current span, may be *kept* because of the higher sampling rate. At the same time, in case where the parent span is *kept*, the child span would be *kept* as well (due to our consistent sampling approach) since the child's sampling rate is greater than the parent's sampling rate.

Note that while both the above cases can result in incomplete traces, they still meet the consistent sampling goals.

For the output TraceState,

- The `th` key MUST be defined with a value corresponding to the sampling probability the sampler used.
- The `rv` value, if present on the input TraceState, MUST be defined and equal to the incoming span context's `rv` value, including the root context.

Trace SDKs are responsible for synthesizing `rv` values in the OpenTelemetry TraceState root span contexts.
- The `th` key MUST be defined with a threshold value corresponding to the sampling probability the sampler used.
- If trace randomness was derived from a TraceState `rv` value, the same `rv` value MUST be defined and equal to the incoming Context's `rv` value.

### Downstream samplers

Expand Down Expand Up @@ -181,7 +172,7 @@ func ProbabilityToThresholdWithPrecision(probability float64, precision int) str
}
```

To translate directly from floating point probability into a 56-bit unsigned integer representation using `math.Round()` and shift operations, see the [OpenTelemetry Collector-Contrib `pkg/sampling` package][PKGSAMPLING] package demonstrates this form of directly calculating integer thresholds from probabilities.
To translate directly from floating point probability into a 56-bit unsigned integer representation using `math.Round()` and shift operations, see the [OpenTelemetry Collector-Contrib `pkg/sampling` package][PKGSAMPLING] package. This package demonstrates how to directly calculate integer thresholds from probabilities.

[PKGSAMPLING]: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/sampling/README.md

Expand Down Expand Up @@ -209,9 +200,12 @@ OpenTelemetry SDKs are recommended to use 4 digits of precision by default. The
To convert a 56-bit integer threshold value to the t-value representation, emit it as a hexadecimal value (without a leading '0x'), optionally with trailing zeros omitted:

```py
h = hex(tvalue).rstrip('0')
# remove leading 0x
tv = 'tv='+h[2:]
if tvalue == 0:
add_otel_trace_state('tv:0')
else:
h = hex(tvalue).rstrip('0')
# remove leading 0x
add_otel_trace_state('tv:'+h[2:])
```

### Testing randomness vs threshold
Expand Down