fix(sampling): ensure agent based sampling is not reset after forking and on tracer.configure #13560

mabdinur · 2025-06-02T14:38:29Z

Builds on 7fbdc9f

Fix: Avoids reinitializing the SpanAggregator on tracer.configure(...) and when an application forks. Instead SpanAggregator._reset() is called. This operation ensures global configurations are re-applied, trace buffer is reset, and trace writer is recreated. This ensures agent based sampling rules are not reset.
Clean up
- Removes writer parameter from SpanAggregator.__init__(...) with this change the intialization of the global writer is an implementation detail of the SpanAggregator. There is no longer a need to supply the SpanAggregator with a writer on the initialization of the global tracer.
- Removes the initialization of the SpanAggregator from _default_span_processors_factory. With this change the global tracer's SpanAggregator is never re-created. It's only modified when tracer.configure(..) is used.
- Rename DatadogSampler._service_based_samplers property to DatadogSampler._agent_sampling_rules to improve clarity. These sampling rules are no longer supplied via environment variables or a programatic api, they can only be set by the Datadog Agent.
- Splits SpanAggregator.trace_proccessors into two properties SpanAggregator.dd_proccessors and SpanAggregator.user_processors. SpanAggregator.users_proccessors is set after application startup via Tracer.configure(..) while SpanAggregator.dd_proccessors is internal to the ddtrace library and should only be set by ddtrace components. This separation allows us to avoid recreating all trace processors when tracer.configure() is called.
- Adds a more descriptive release note to an unreleased fix.

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2025-06-02T14:38:56Z

CODEOWNERS have been resolved as:

releasenotes/notes/fix-agent-based-sampling-8877694a37053e51.yaml       @DataDog/apm-python
ddtrace/_trace/processor/__init__.py                                    @DataDog/apm-sdk-api-python
ddtrace/_trace/sampler.py                                               @DataDog/apm-sdk-api-python
ddtrace/_trace/tracer.py                                                @DataDog/apm-sdk-api-python
ddtrace/internal/ci_visibility/recorder.py                              @DataDog/ci-app-libraries
ddtrace/internal/writer/writer.py                                       @DataDog/apm-core-python
ddtrace/opentracer/tracer.py                                            @DataDog/apm-sdk-api-python
tests/ci_visibility/test_ci_visibility.py                               @DataDog/ci-app-libraries
tests/integration/test_priority_sampling.py                             @DataDog/apm-sdk-api-python
tests/tracer/test_processors.py                                         @DataDog/apm-sdk-api-python
tests/tracer/test_sampler.py                                            @DataDog/apm-sdk-api-python
tests/tracer/test_writer.py                                             @DataDog/apm-sdk-api-python
tests/utils.py                                                          @DataDog/python-guild

github-actions · 2025-06-02T14:59:05Z

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 275 ± 4 ms.

The average import time from base is: 277 ± 4 ms.

The import time difference between this PR and base is: -1.1 ± 0.2 ms.

Import time breakdown

The following import paths have grown:

ddtrace.auto 0.213 ms (0.08%)

ddtrace 0.149 ms (0.05%)

ddtrace.trace 0.149 ms (0.05%)

ddtrace._trace.tracer 0.107 ms (0.04%)

ddtrace.internal.schema.processor 0.107 ms (0.04%)

ddtrace._trace.filters 0.042 ms (0.02%)

ddtrace._trace.processor 0.042 ms (0.02%)

ddtrace.bootstrap.sitecustomize 0.063 ms (0.02%)

ddtrace.bootstrap.preload 0.063 ms (0.02%)

ddtrace.internal.products 0.063 ms (0.02%)

importlib.metadata 0.063 ms (0.02%)

importlib.metadata._collections 0.063 ms (0.02%)

The following import paths have shrunk:

ddtrace.auto 2.078 ms (0.75%)

ddtrace.bootstrap.sitecustomize 1.334 ms (0.48%)

ddtrace.bootstrap.preload 1.334 ms (0.48%)

ddtrace.internal.remoteconfig.client 0.632 ms (0.23%)

ddtrace.internal.products 0.062 ms (0.02%)

importlib.metadata 0.062 ms (0.02%)

importlib.metadata._itertools 0.062 ms (0.02%)

ddtrace 0.745 ms (0.27%)

ddtrace.trace 0.101 ms (0.04%)

ddtrace._trace.tracer 0.101 ms (0.04%)

ddtrace.settings.peer_service 0.101 ms (0.04%)

pr-commenter · 2025-06-02T15:29:20Z

Benchmarks

Benchmark execution time: 2025-06-17 04:04:14

Comparing candidate commit 7101cbd in PR branch munir/cleanup-agent-based-sampling-fix with baseline commit 02341fe in branch main.

Found 0 performance improvements and 5 performance regressions! Performance is the same for 556 metrics, 3 unstable metrics.

scenario:iastaspects-upper_aspect

🟥 execution_time [+205.781ns; +239.780ns] or [+9.277%; +10.809%]

scenario:iastaspectsospath-ospathjoin_aspect

🟥 execution_time [+874.583ns; +968.053ns] or [+14.197%; +15.715%]

scenario:iastaspectsospath-ospathsplitext_aspect

🟥 execution_time [+353.219ns; +538.090ns] or [+7.852%; +11.961%]

scenario:telemetryaddmetric-1-distribution-metric-1-times

🟥 execution_time [+318.413ns; +378.706ns] or [+10.940%; +13.011%]

scenario:telemetryaddmetric-flush-1-metric

🟥 execution_time [+337.574ns; +483.705ns] or [+8.169%; +11.706%]

brettlangdon

would be nice to have a regression test for this, otherwise lgtm

ddtrace/_trace/tracer.py

emmettbutler

Nice work. Is there a test for how this works in subprocesses that I'm missing?

emmettbutler · 2025-06-16T17:27:13Z

ddtrace/_trace/sampler.py

@@ -101,7 +102,7 @@ def __init__(
        else:
            self.rules: List[SamplingRule] = rules or []
        # Set Agent based samplers
-        self._by_service_samplers: Dict[str, RateSampler] = {}
+        self._agent_sampling_rules = agent_sampling_rules or {}


_agent_based_samplers might be better here, since the use of the term "rules" conflicts with the way it's used elsewhere in the library as opposed to "sampler".

datadog-datadog-prod-us1 · 2025-06-17T03:15:51Z

tests/tracer/test_processors.py

+            return trace
+
+    class UserProc(TraceProcessor)
+        def process_trace(self, trace):


⚪ Code Quality Violation

class names should be PascalCase (...read more)

Class names should be PascalCase and not camelCase or snake_case.

Learn More

PEP8 style guide

datadog-datadog-prod-us1 · 2025-06-17T03:15:51Z

tests/tracer/test_processors.py

+
+    class UserProc(TraceProcessor)
+        def process_trace(self, trace):
+            return trace


⚪ Code Quality Violation

return outside of a function (...read more)

All return statements must be within a function. Putting a return statement outside of a function may have unexpected behaviors (such as exiting the program early).

mabdinur added 2 commits June 2, 2025 10:33

chore(sampling): clean up agent based sampling fix

b0e7019

ensure response_callback is not reset

3ceca2a

mabdinur and others added 2 commits June 2, 2025 11:30

fix dummy tracer

e288f92

Merge branch 'main' into munir/cleanup-agent-based-sampling-fix

ddf2848

brettlangdon reviewed Jun 13, 2025

View reviewed changes

ddtrace/_trace/tracer.py Show resolved Hide resolved

mabdinur added 4 commits June 14, 2025 19:26

avoid resetting agent sampling rules

5abe4b6

rename _by_service_samplers to _agent_sampling_rules

db7530b

simplify span aggregator create and recreate logic

f0f7899

remove writer arg from SpanAggregator

7e14fee

mabdinur force-pushed the munir/cleanup-agent-based-sampling-fix branch from 4d50f44 to 7e14fee Compare June 15, 2025 01:56

mabdinur changed the title ~~chore(sampling): clean up agent based sampling fix~~ fix(sampling): clean up agent based sampling fix Jun 15, 2025

mabdinur changed the title ~~fix(sampling): clean up agent based sampling fix~~ fix(sampling): ensure agent service based sampling is not reset after forks and on tracer.configure Jun 15, 2025

fmt

146a13c

mabdinur changed the title ~~fix(sampling): ensure agent service based sampling is not reset after forks and on tracer.configure~~ fix(sampling): ensure agent service based sampling is not reset after forking and on tracer.configure Jun 15, 2025

fix userprocessor overwriting

cbfe917

mabdinur force-pushed the munir/cleanup-agent-based-sampling-fix branch from 6afec51 to cbfe917 Compare June 15, 2025 17:38

mabdinur changed the title ~~fix(sampling): ensure agent service based sampling is not reset after forking and on tracer.configure~~ fix(sampling): ensure agent based sampling is not reset after forking and on tracer.configure Jun 15, 2025

mabdinur marked this pull request as ready for review June 15, 2025 22:49

mabdinur requested review from a team as code owners June 15, 2025 22:49

mabdinur requested review from ZStriker19, anmarchenko and erikayasuda June 15, 2025 22:49

add tests

4ef4b21

add regression tests

d790205

emmettbutler approved these changes Jun 16, 2025

View reviewed changes

rename agent_sampling_rules to agent_based_samplers to improve clarity

4d311a8

mabdinur force-pushed the munir/cleanup-agent-based-sampling-fix branch from ef76a21 to 4d311a8 Compare June 17, 2025 00:17

move span aggregator implementation details to reset method

7101cbd

datadog-datadog-prod-us1 bot reviewed Jun 17, 2025

View reviewed changes

mabdinur force-pushed the munir/cleanup-agent-based-sampling-fix branch from 376430f to 7101cbd Compare June 17, 2025 03:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sampling): ensure agent based sampling is not reset after forking and on tracer.configure #13560

fix(sampling): ensure agent based sampling is not reset after forking and on tracer.configure #13560

mabdinur commented Jun 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 2, 2025 •

edited

Loading

Uh oh!

pr-commenter bot commented Jun 2, 2025 •

edited

Loading

Uh oh!

brettlangdon left a comment

Uh oh!

Uh oh!

emmettbutler left a comment

Uh oh!

emmettbutler Jun 16, 2025

Uh oh!

datadog-datadog-prod-us1 bot Jun 17, 2025

Uh oh!

datadog-datadog-prod-us1 bot Jun 17, 2025

Uh oh!

Uh oh!

fix(sampling): ensure agent based sampling is not reset after forking and on tracer.configure #13560

Are you sure you want to change the base?

fix(sampling): ensure agent based sampling is not reset after forking and on tracer.configure #13560

Conversation

mabdinur commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Reviewer Checklist

Uh oh!

github-actions bot commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bootstrap import analysis

Summary

Import time breakdown

Uh oh!

pr-commenter bot commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

scenario:iastaspects-upper_aspect

scenario:iastaspectsospath-ospathjoin_aspect

scenario:iastaspectsospath-ospathsplitext_aspect

scenario:telemetryaddmetric-1-distribution-metric-1-times

scenario:telemetryaddmetric-flush-1-metric

Uh oh!

brettlangdon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

emmettbutler left a comment

Choose a reason for hiding this comment

Uh oh!

emmettbutler Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

datadog-datadog-prod-us1 bot Jun 17, 2025

Choose a reason for hiding this comment

⚪ Code Quality Violation

Learn More

Uh oh!

datadog-datadog-prod-us1 bot Jun 17, 2025

Choose a reason for hiding this comment

⚪ Code Quality Violation

Uh oh!

Uh oh!

mabdinur commented Jun 2, 2025 •

edited

Loading

github-actions bot commented Jun 2, 2025 •

edited

Loading

github-actions bot commented Jun 2, 2025 •

edited

Loading

pr-commenter bot commented Jun 2, 2025 •

edited

Loading