Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: pkg/sampling #29738

Closed
2 tasks
jmacd opened this issue Dec 11, 2023 · 5 comments
Closed
2 tasks

New component: pkg/sampling #29738

jmacd opened this issue Dec 11, 2023 · 5 comments
Labels
Accepted Component New component has been sponsored Stale

Comments

@jmacd
Copy link
Contributor

jmacd commented Dec 11, 2023

The purpose and use-cases of the new component

New pkg/sampling component will contain common library code for:

  1. Parsing a W3C tracestate header
  2. APIs for extracting modifying, interpreting an OpenTelemetry tracestate header
  3. APIs for Threshold, Randomness, conversion to/from probability values
  4. Consistent probability sampling mechanism using all of the above.

Example configuration for the component

No direct configuration in yaml. This module supports a strictly defined translation between IEEE 754 floating point values and so-called "Threshold" values.

A simple component that uses this library (e.g., probabilisticsamplerprocessor) will extract probabilities from yaml configuration and convert them into a Threshold value using one of two methods:

  1. ProbabilityToThreshold: exact translation from floating point to threshold, preserves all significant bits, in which case Threshold-to-Probability is an exact round trip.
  2. ProbabilityToThresholdWithPrecision: approximate translation from floating point to threshold, will round probability value to limit the encoding size, in which case Threshold-to-Probability is not a round trip.

Telemetry data types supported

Tracing receives direct support, in the sense that Randomness can be extracted from pcommon.TraceID. However, the Randomness type can be used directly and this package may be extended to apply consistent probability sampling to other data types.

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

jmacd, kentquirk

Sponsor (optional)

codeboten

Additional context

This implements the draft specification proposal in open-telemetry/oteps#235
This implements the already-specified rules for tracestate handling: https://opentelemetry.io/docs/specs/otel/trace/tracestate-handling/

This is one step in an umbrella issue, the overall goal is stated here: open-telemetry/opentelemetry-specification#1413

@codeboten
Copy link
Contributor

I'm happy to sponsor this module

@codeboten codeboten added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor needs triage New item requiring triage labels Dec 13, 2023
@TylerHelmuth
Copy link
Member

I'd also like to sponsor. Go sampling!

@jiekun
Copy link
Member

jiekun commented Dec 15, 2023

I would like to subscribe it as well.

Probabilistic sampling policy is also available in tail-sampling processor. And I've seen discussion on Slack complaining it: #27044.

So I have been looking forward to the probabilistic sampling being unified in SDKs, processors, and being managed in a unified manner.

jpkrohling added a commit that referenced this issue Jan 31, 2024
…29720)

**Description:** This is the `pkg/sampling` portion of of
#24811.

**Link to tracking Issue:** 
#29738

open-telemetry/opentelemetry-specification#1413

**Testing:** Complete.

**Documentation:** New README added.

---------

Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Kent Quirk <kentquirk@gmail.com>
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Feb 1, 2024
…pen-telemetry#29720)

**Description:** This is the `pkg/sampling` portion of of
open-telemetry#24811.

**Link to tracking Issue:** 
open-telemetry#29738

open-telemetry/opentelemetry-specification#1413

**Testing:** Complete.

**Documentation:** New README added.

---------

Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Kent Quirk <kentquirk@gmail.com>
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Feb 14, 2024
@jpkrohling
Copy link
Member

This was done as part of #29720.

jpkrohling added a commit that referenced this issue Jun 13, 2024
…rt OTEP 235) (#31894)

**Description:** Creates new sampler modes named "equalizing" and
"proportional". Preserves existing functionality under the mode named
"hash_seed".

Fixes #31918

This is the final step in a sequence, the whole of this work was
factored into 3+ PRs, including the new `pkg/sampling` and the previous
step, #31946. The two new Sampler modes enable mixing OTel sampling SDKs
with Collectors in a consistent way.

The existing hash_seed mode is also a consistent sampling mode, which
makes it possible to have a 1:1 mapping between its decisions and the
OTEP 235 randomness and threshold values. Specifically, the 14-bit hash
value and sampling probability are mapped into 56-bit R-value and
T-value encodings, so that all sampling decisions in all modes include
threshold information.

This implements the semantic conventions of
open-telemetry/semantic-conventions#793, namely
the `sampling.randomness` and `sampling.threshold` attributes used for
logs where there is no tracestate.

The default sampling mode remains HashSeed. We consider a future change
of default to Proportional to be desirable, because:

1. Sampling probability is the same, only the hashing algorithm changes
2. Proportional respects and preserves information about earlier
sampling decisions, which HashSeed can't do, so it has greater
interoperability with OTel SDKs which may also adopt OTEP 235 samplers.

**Link to tracking Issue:** 

Draft for
open-telemetry/opentelemetry-specification#3602.
Previously
#24811,
see also open-telemetry/oteps#235
Part of #29738

**Testing:** New testing has been added.

**Documentation:** ✅

---------

Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored Stale
Projects
None yet
Development

No branches or pull requests

5 participants