Skip to content

fix(di): dynamic function discovery fallback #13947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 15, 2025

Conversation

P403n1x87
Copy link
Contributor

@P403n1x87 P403n1x87 commented Jul 10, 2025

We implement a dynamic function discovery fallback when the function-from-code resolution via the GC fails. This can happen if the target application has interacted with the GC, e.g. by freezing it at a time that will prevent the current discovery from being able to resolve the function from the referenced code object.

Testing Strategy

The original issue was reproducible with a local deployment of synapse. The investigation led to the conclusion that the issue was caused by the way the application interacts with the GC https://github.com/element-hq/synapse/blob/1dc29563c1504a2523e467aa7bef6a7ac05cc60c/synapse/app/_base.py#L623C1-L629C35. Commenting out these lines makes the issue disappear. We have tested the fix against the unmodified application to verify that the proposed fix works.

Refs: DYNIS-28

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@P403n1x87 P403n1x87 added the Dynamic Instrumentation Dynamic Instrumentation/Live Debugger label Jul 10, 2025
Copy link
Contributor

CODEOWNERS have been resolved as:

releasenotes/notes/fix-di-dynamic-discovery-fallback-3a5623e18584cd79.yaml  @DataDog/apm-python
ddtrace/debugging/_function/discovery.py                                @DataDog/debugger-python

We implement a dynamic function discovery fallback when the
function-from-code resolution via the GC fails. This can happen if the
target application has interacted with the GC, e.g. by freezing it at a
time that will prevent the current discovery from being able to resolve
the function from the referenced code object.
@P403n1x87 P403n1x87 force-pushed the fix/di-dynamic-discovery-fallback branch from e9ca96c to e42f5ac Compare July 10, 2025 17:02
@P403n1x87 P403n1x87 marked this pull request as ready for review July 10, 2025 17:12
@P403n1x87 P403n1x87 requested review from a team as code owners July 10, 2025 17:12
Copy link
Contributor

github-actions bot commented Jul 10, 2025

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 286 ± 6 ms.

The average import time from base is: 297 ± 6 ms.

The import time difference between this PR and base is: -11.3 ± 0.3 ms.

Import time breakdown

The following import paths have grown:

ddtrace.auto 0.061 ms (0.02%)
ddtrace.bootstrap.sitecustomize 0.061 ms (0.02%)
ddtrace._trace.trace_handlers 0.061 ms (0.02%)
ddtrace.contrib.trace_utils 0.061 ms (0.02%)
ddtrace.contrib.internal.trace_utils 0.061 ms (0.02%)
ddtrace.contrib.internal.trace_utils_base 0.061 ms (0.02%)

The following import paths have shrunk:

ddtrace.auto 5.381 ms (1.88%)
ddtrace.bootstrap.sitecustomize 2.905 ms (1.02%)
ddtrace.bootstrap.preload 2.647 ms (0.92%)
ddtrace.internal.remoteconfig.client 0.828 ms (0.29%)
multiprocessing.sharedctypes 0.211 ms (0.07%)
multiprocessing.heap 0.211 ms (0.07%)
mmap 0.211 ms (0.07%)
ddtrace.internal.products 0.177 ms (0.06%)
importlib.metadata 0.177 ms (0.06%)
zipfile 0.177 ms (0.06%)
zipfile._path 0.177 ms (0.06%)
ddtrace.settings.profiling 0.074 ms (0.03%)
ddtrace.vendor.psutil 0.074 ms (0.03%)
ddtrace.internal.remoteconfig._connectors 0.051 ms (0.02%)
ctypes 0.051 ms (0.02%)
_ctypes 0.051 ms (0.02%)
ddtrace.internal.remoteconfig.worker 0.046 ms (0.02%)
ddtrace.internal.symbol_db.remoteconfig 0.030 ms (0.01%)
ddtrace.internal.symbol_db.symbols 0.030 ms (0.01%)
ddtrace.settings.symbol_db 0.030 ms (0.01%)
ddtrace._trace.trace_handlers 0.181 ms (0.06%)
ddtrace.ext.db 0.096 ms (0.03%)
ddtrace.contrib.internal.subprocess.constants 0.077 ms (0.03%)
ddtrace 2.477 ms (0.87%)
ddtrace._logger 1.171 ms (0.41%)
ddtrace.internal.telemetry 1.171 ms (0.41%)
ddtrace.internal.telemetry.writer 0.705 ms (0.25%)
http.client 0.395 ms (0.14%)
ssl 0.187 ms (0.07%)
_ssl 0.061 ms (0.02%)
email.parser 0.152 ms (0.05%)
email.feedparser 0.152 ms (0.05%)
email._policybase 0.152 ms (0.05%)
email.header 0.152 ms (0.05%)
email.charset 0.152 ms (0.05%)
ddtrace.settings._telemetry 0.076 ms (0.03%)
ddtrace.settings._inferred_base_service 0.032 ms (0.01%)
ddtrace.internal.telemetry.metrics_namespaces 0.052 ms (0.02%)
ddtrace.internal.telemetry.data 0.050 ms (0.02%)
ddtrace.internal.packages 0.050 ms (0.02%)
_sysconfigdata__linux_x86_64-linux-gnu 0.050 ms (0.02%)
ddtrace.internal.encoding 0.047 ms (0.02%)
ddtrace.internal._encoding 0.047 ms (0.02%)
ddtrace.internal.runtime 0.035 ms (0.01%)
uuid 0.035 ms (0.01%)
ddtrace.settings._agent 0.422 ms (0.15%)
ddtrace.settings 0.367 ms (0.13%)
ddtrace.settings.http 0.272 ms (0.09%)
ddtrace.internal.utils.cache 0.235 ms (0.08%)
inspect 0.235 ms (0.08%)
ddtrace.internal.utils.http 0.037 ms (0.01%)
dataclasses 0.037 ms (0.01%)
ddtrace.settings.integration 0.095 ms (0.03%)
ddtrace.vendor.debtcollector 0.095 ms (0.03%)
ddtrace.vendor.debtcollector.moves 0.056 ms (0.02%)
ddtrace.vendor 0.039 ms (0.01%)
ddtrace.internal.module 0.039 ms (0.01%)
ddtrace.internal.wrapping.context 0.039 ms (0.01%)
socket 0.055 ms (0.02%)
_socket 0.055 ms (0.02%)
ddtrace.internal.utils.formats 0.044 ms (0.02%)
ddtrace.internal.compat 0.044 ms (0.02%)
pathlib 0.044 ms (0.02%)
urllib.parse 0.044 ms (0.02%)
ddtrace.trace 0.341 ms (0.12%)
ddtrace._trace.filters 0.257 ms (0.09%)
ddtrace._trace.processor 0.257 ms (0.09%)
ddtrace._trace.sampler 0.086 ms (0.03%)
ddtrace._trace.span 0.086 ms (0.03%)
ddtrace.internal._rand 0.029 ms (0.01%)
ddtrace.internal.writer 0.065 ms (0.02%)
ddtrace.internal.writer.writer 0.065 ms (0.02%)
ddtrace.internal.dogstatsd 0.050 ms (0.02%)
ddtrace.vendor.dogstatsd 0.050 ms (0.02%)
ddtrace.vendor.dogstatsd.base 0.050 ms (0.02%)
ddtrace.internal._unpatched 0.224 ms (0.08%)
subprocess 0.177 ms (0.06%)
contextlib 0.177 ms (0.06%)
json 0.046 ms (0.02%)
json.decoder 0.046 ms (0.02%)
re 0.046 ms (0.02%)
enum 0.046 ms (0.02%)
types 0.046 ms (0.02%)
ddtrace._monkey 0.045 ms (0.02%)
ddtrace.settings._config 0.026 ms (0.01%)
ddtrace.internal.schema 0.026 ms (0.01%)

@tylfin tylfin requested a review from Copilot July 10, 2025 17:52
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a dynamic fallback for function resolution when the GC-based lookup fails and updates the discovery logic to use it.

  • Adds a private _resolve_pair method that first tries pair.resolve() and, on ValueError, walks the module’s attributes to find the function.
  • Updates by_name to call _resolve_pair for both direct and name-index lookups.
  • Includes a release note entry for the new fallback behavior.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
releasenotes/notes/fix-di-dynamic-discovery-fallback-3a5623e18584cd79.yaml Adds a release note about the dynamic function discovery fallback.
ddtrace/debugging/_function/discovery.py Introduces _resolve_pair and refactors by_name to leverage the fallback.

@pr-commenter
Copy link

pr-commenter bot commented Jul 10, 2025

Benchmarks

Benchmark execution time: 2025-07-11 11:15:21

Comparing candidate commit 5f6c000 in PR branch fix/di-dynamic-discovery-fallback with baseline commit 9e78c42 in branch main.

Found 0 performance improvements and 2 performance regressions! Performance is the same for 546 metrics, 2 unstable metrics.

scenario:iastaspects-replace_aspect

  • 🟥 execution_time [+711.000ns; +780.087ns] or [+14.911%; +16.360%]

scenario:telemetryaddmetric-1-distribution-metric-1-times

  • 🟥 execution_time [+277.198ns; +342.442ns] or [+9.415%; +11.631%]

@P403n1x87 P403n1x87 enabled auto-merge (squash) July 15, 2025 09:20
@P403n1x87 P403n1x87 merged commit b55f08f into main Jul 15, 2025
416 of 418 checks passed
@P403n1x87 P403n1x87 deleted the fix/di-dynamic-discovery-fallback branch July 15, 2025 09:20
github-actions bot pushed a commit that referenced this pull request Jul 15, 2025
We implement a dynamic function discovery fallback when the
function-from-code resolution via the GC fails. This can happen if the
target application has interacted with the GC, e.g. by freezing it at a
time that will prevent the current discovery from being able to resolve
the function from the referenced code object.

## Testing Strategy

The original issue was reproducible with a local deployment of
[synapse](https://github.com/element-hq/synapse). The investigation led
to the conclusion that the issue was caused by the way the application
interacts with the GC
https://github.com/element-hq/synapse/blob/1dc29563c1504a2523e467aa7bef6a7ac05cc60c/synapse/app/_base.py#L623C1-L629C35.
Commenting out these lines makes the issue disappear. We have tested the
fix against the unmodified application to verify that the proposed fix
works.

Refs: [DYNIS-28](https://datadoghq.atlassian.net/browse/DYNIS-28)

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

[DYNIS-28]:
https://datadoghq.atlassian.net/browse/DYNIS-28?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

(cherry picked from commit b55f08f)
github-actions bot pushed a commit that referenced this pull request Jul 15, 2025
We implement a dynamic function discovery fallback when the
function-from-code resolution via the GC fails. This can happen if the
target application has interacted with the GC, e.g. by freezing it at a
time that will prevent the current discovery from being able to resolve
the function from the referenced code object.

## Testing Strategy

The original issue was reproducible with a local deployment of
[synapse](https://github.com/element-hq/synapse). The investigation led
to the conclusion that the issue was caused by the way the application
interacts with the GC
https://github.com/element-hq/synapse/blob/1dc29563c1504a2523e467aa7bef6a7ac05cc60c/synapse/app/_base.py#L623C1-L629C35.
Commenting out these lines makes the issue disappear. We have tested the
fix against the unmodified application to verify that the proposed fix
works.

Refs: [DYNIS-28](https://datadoghq.atlassian.net/browse/DYNIS-28)

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

[DYNIS-28]:
https://datadoghq.atlassian.net/browse/DYNIS-28?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

(cherry picked from commit b55f08f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants