feat(llmobs): add bedrock agents tracing #13439

Yun-Kim · 2025-05-16T19:14:46Z

MLOB-2660
This PR adds tracing for AWS Bedrock agents runtime.

There are 2 components to this PR:

APM+LLMObs: Trace bedrock agents runtime invokeAgent() call, i.e. the end-to-end agent request.
LLMObs-only: if AWS bedrock traces are returned by the response (i.e. the user adds enableTrace=True in the request params), then we translate the returned bedrock traces into LLMObs span events.

Translating AWS Bedrock Traces to Datadog LLMObs Spans

The main idea here is that we are receiving AWS bedrock "trace" events and converting/stitching them into Datadog LLMObs spans. We do three levels of tracing:

E2E Agent Request: AWS Bedrock traces don't have a top-level agent "trace" concept, so we are capturing this directly by tracing the invokeAgent call.
Trace steps: AWS Bedrock traces events are not hierarchical in nature and represent steps in the workflow, and contain a combination of LLM/tool/reasoning steps (instead of Datadog LLMObs and other common LLMObs frameworks which separate out each step into its own span). We are explicitly creating subtrace spans to represent each trace step by stitching together the internal trace step events' timestamps, and annotating with the input of the first inner step, and the output of the last inner step.
Inner trace step: Based on the bedrock trace event, we stitch events together into LLM/Tool/Task spans.

Trace steps and inner trace Steps

As of the time of writing, Bedrock Agent Traces can consist of the following trace types, each with their own internal structure and unique types:

Failure
Guardrail
Orchestration
PreProcessing
PostProcessing
Routing
CustomOrchestration
Observation
Inner trace step events also include but are not limited to modelInvocationInput, modelInvocationOutput, actionGroupInvocationInput, actionGroupInvocationOutput. These events need to be stitched together to create LLMObs span events. I've also made rationale and guardrail trace step events standalone spans. Also, since we're translating events to LLMObs span dictionaries, things like exceptions specified in the trace step events need to be raised explicitly in our span events in order to retain the traceback information to propagate to the root agent span.

For more information on the Bedrock Agent Trace structure/formats, see their docs.

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2025-05-16T19:15:16Z

CODEOWNERS have been resolved as:

ddtrace/contrib/internal/botocore/services/bedrock_agents.py            @DataDog/ml-observability
ddtrace/llmobs/_integrations/bedrock_agents.py                          @DataDog/ml-observability
ddtrace/llmobs/_integrations/bedrock_utils.py                           @DataDog/ml-observability
releasenotes/notes/feat-bedrock-agents-efb725a0860eae87.yaml            @DataDog/apm-python
tests/contrib/botocore/bedrock_cassettes/agent_invoke.yaml              @DataDog/ml-observability
tests/contrib/botocore/bedrock_cassettes/agent_invoke_trace_disabled.yaml  @DataDog/ml-observability
tests/contrib/botocore/conftest.py                                      @DataDog/apm-core-python @DataDog/apm-idm-python
tests/contrib/botocore/test_bedrock_agents.py                           @DataDog/ml-observability
tests/contrib/botocore/test_bedrock_agents_llmobs.py                    @DataDog/ml-observability
tests/snapshots/tests.contrib.botocore.test_bedrock_agents.test_agent_invoke.json  @DataDog/apm-python
.github/CODEOWNERS                                                      @DataDog/python-guild @DataDog/apm-core-python
ddtrace/contrib/integration_registry/registry.yaml                      @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/contrib/internal/botocore/patch.py                              @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/contrib/internal/botocore/services/bedrock.py                   @DataDog/ml-observability
ddtrace/llmobs/_integrations/bedrock.py                                 @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
riotfile.py                                                             @DataDog/apm-python
supported_versions_output.json                                          @DataDog/apm-core-python
supported_versions_table.csv                                            @DataDog/apm-core-python
tests/contrib/botocore/bedrock_utils.py                                 @DataDog/ml-observability
tests/contrib/botocore/test_bedrock.py                                  @DataDog/ml-observability
tests/contrib/botocore/test_bedrock_llmobs.py                           @DataDog/ml-observability
.riot/requirements/14fceda.txt                                          @DataDog/apm-python
.riot/requirements/1558546.txt                                          @DataDog/apm-python
.riot/requirements/160bd16.txt                                          @DataDog/apm-python
.riot/requirements/17fe359.txt                                          @DataDog/apm-python
.riot/requirements/1ada48c.txt                                          @DataDog/apm-python

github-actions · 2025-05-16T19:35:17Z

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 279 ± 3 ms.

The average import time from base is: 281 ± 4 ms.

The import time difference between this PR and base is: -1.6 ± 0.2 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 1.945 ms (0.70%)

ddtrace.bootstrap.sitecustomize 1.272 ms (0.46%)

ddtrace.bootstrap.preload 1.272 ms (0.46%)

ddtrace.internal.remoteconfig.client 0.628 ms (0.23%)

ddtrace 0.674 ms (0.24%)

ddtrace.internal._unpatched 0.030 ms (0.01%)

json 0.030 ms (0.01%)

json.decoder 0.030 ms (0.01%)

re 0.030 ms (0.01%)

enum 0.030 ms (0.01%)

types 0.030 ms (0.01%)

pr-commenter · 2025-05-16T20:07:51Z

Benchmarks

Benchmark execution time: 2025-06-16 21:37:25

Comparing candidate commit f6a4841 in PR branch yunkim/llmobs-bedrock-agent with baseline commit 5692392 in branch main.

Found 0 performance improvements and 3 performance regressions! Performance is the same for 558 metrics, 3 unstable metrics.

scenario:iastaspectsospath-ospathsplitdrive_aspect

🟥 execution_time [+375.834ns; +431.276ns] or [+10.992%; +12.614%]

scenario:iastaspectsospath-ospathsplitext_aspect

🟥 execution_time [+658.342ns; +720.485ns] or [+15.334%; +16.781%]

scenario:iastaspectssplit-splitlines_aspect

🟥 execution_time [+129.528ns; +159.169ns] or [+8.855%; +10.882%]

ddtrace/llmobs/_integrations/bedrock_agents.py

…tep spans

… add defensive instrumentation try/catch

tests/contrib/botocore/test_bedrock_agents_llmobs.py

sabrenner

did a first pass, mostly nits, i think actionable for when you're back. it largely makes sense to me. i'll do a second pass at the tests and an additional pass on the bedrock agents file (that's the one i spent the most time looking at 😅)

ddtrace/contrib/internal/botocore/services/bedrock_agents.py

ddtrace/llmobs/_integrations/bedrock.py

riotfile.py

ddtrace/llmobs/_integrations/bedrock_agents.py

tests/contrib/botocore/test_bedrock_agents.py

sabrenner

just a couple questions/nits but am good to approve after they're addressed! nice work with this in such short time 😄

ddtrace/llmobs/_integrations/bedrock_agents.py

sabrenner

🚢

brettlangdon · 2025-06-16T17:45:51Z

ddtrace/contrib/internal/botocore/services/bedrock_agents.py

+        span.finish()
+        raise


Suggested change

span.finish()

raise

raise

finally:

span.finish()

?

We only want to finish the span if there's an error, since we're always expecting a streamed response otherwise (which we'll finish in the TracedStreamResponse wrapper class)

brettlangdon · 2025-06-16T17:46:55Z

ddtrace/contrib/internal/botocore/services/bedrock_agents.py

+    try:
+        result = original_func(*args, **kwargs)
+        result = handle_bedrock_agent_response(result, integration, span, args, kwargs)
+        return result
+    except Exception:
+        integration.llmobs_set_tags(span, args, kwargs, result, operation="agent")
+        span.set_exc_info(*sys.exc_info())
+        span.finish()
+        raise


Suggested change

try:

result = original_func(*args, **kwargs)

result = handle_bedrock_agent_response(result, integration, span, args, kwargs)

return result

except Exception:

integration.llmobs_set_tags(span, args, kwargs, result, operation="agent")

span.set_exc_info(*sys.exc_info())

span.finish()

raise

try:

result = original_func(*args, **kwargs)

result = handle_bedrock_agent_response(result, integration, span, args, kwargs)

return result

finally:

integration.llmobs_set_tags(span, args, kwargs, result, operation="agent")

?

Unfortunately here we only want to call the finish/llmobs_set_tags methods in case of error, otherwise we handle them once the wrapped StreamResponse object consumes all chunks. This is just an error case handling here, I'll add a comment to clarify this

tests/contrib/botocore/conftest.py

quinna-h · 2025-06-16T19:57:46Z

riotfile.py

@@ -1377,7 +1377,7 @@ def select_pys(min_version: str = MIN_PYTHON_VERSION, max_version: str = MAX_PYT
                ),
                Venv(
                    pys=select_pys(min_version="3.9"),
-                    pkgs={"vcrpy": "==7.0.0", "botocore": ">=1.34.131", "boto3": ">=1.34.131"},
+                    pkgs={"vcrpy": "==7.0.0", "botocore": "==1.38.26", "boto3": "==1.38.26"},


note that we might have to change the min compatible version here: https://github.com/DataDog/dd-trace-py/pull/13613/files#diff-1f8812e8f134ceee7717f2f35d62d2ad3c455933bac5899e932e96c39e5cdf80 - I think @wconti27 was doing something with enforcing minimum version for SSI?

Do we? I thought the lines above there's a base riot Venv that tests an even older version of botocore 1.34.49

quinna-h

LGTM but will defer to the MLObs team for correctness of the implementation

Yun-Kim added 2 commits May 14, 2025 20:08

Refactor patch functions routing for botocore

0b028be

WIP bedrock agents tracing

2e51f4a

Yun-Kim and others added 2 commits May 21, 2025 21:20

Merge branch 'main' into yunkim/llmobs-bedrock-agent

6e5d9dd

Extract out trace types

5d8a259

datadog-datadog-prod-us1 bot reviewed May 28, 2025

View reviewed changes

ddtrace/llmobs/_integrations/bedrock_agents.py Outdated Show resolved Hide resolved

ddtrace/llmobs/_integrations/bedrock_agents.py Outdated Show resolved Hide resolved

Yun-Kim added 4 commits May 28, 2025 19:21

Move final output tagging to integration level

1cf5f9e

Fix broken spans for streamed outputs, add I/O annotation for trace s…

0a1ce50

…tep spans

fmt

c015488

Add agent tests, bump botocore tests to latest boto/botocore version,…

bff5d01

… add defensive instrumentation try/catch

datadog-datadog-prod-us1 bot reviewed May 30, 2025

View reviewed changes

tests/contrib/botocore/test_bedrock_agents_llmobs.py Show resolved Hide resolved

Yun-Kim and others added 2 commits May 30, 2025 14:56

Merge branch 'main' into yunkim/llmobs-bedrock-agent

7b898d9

fmt

36d2993

sabrenner reviewed Jun 5, 2025

View reviewed changes

Yun-Kim and others added 6 commits June 9, 2025 13:36

Merge branch 'main' into yunkim/llmobs-bedrock-agent

5747e00

Refactor

722b167

Handle model name parsing

fc63a48

Fix trace ID propagation

f2b54d3

Releasenote

1f3a6f8

fmt

4a179bb

Yun-Kim marked this pull request as ready for review June 12, 2025 16:52

Yun-Kim requested review from a team as code owners June 12, 2025 16:52

Yun-Kim requested review from mabdinur, quinna-h and brettlangdon June 12, 2025 16:52

Yun-Kim added 2 commits June 12, 2025 12:58

typing

ae95fb3

remove auth profile name

310717d

sabrenner reviewed Jun 12, 2025

View reviewed changes

ddtrace/llmobs/_integrations/bedrock_agents.py Outdated Show resolved Hide resolved

Yun-Kim and others added 3 commits June 13, 2025 17:08

Fix span kind, failing model provider/name

ae67e87

Merge branch 'main' into yunkim/llmobs-bedrock-agent

dc5df34

codeowners

0f78d03

Yun-Kim requested a review from a team as a code owner June 16, 2025 17:00

sabrenner approved these changes Jun 16, 2025

View reviewed changes

brettlangdon reviewed Jun 16, 2025

View reviewed changes

quinna-h reviewed Jun 16, 2025

View reviewed changes

tests/contrib/botocore/conftest.py Outdated Show resolved Hide resolved

quinna-h reviewed Jun 16, 2025

View reviewed changes

quinna-h approved these changes Jun 16, 2025

View reviewed changes

fmt

f6a4841

feat(llmobs): add bedrock agents tracing #13439

Are you sure you want to change the base?

feat(llmobs): add bedrock agents tracing #13439

Conversation

Yun-Kim commented May 16, 2025 • edited by sabrenner Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Translating AWS Bedrock Traces to Datadog LLMObs Spans

Trace steps and inner trace Steps

Checklist

Reviewer Checklist

Uh oh!

github-actions bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bootstrap import analysis

Summary

Import time breakdown

Uh oh!

pr-commenter bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

scenario:iastaspectsospath-ospathsplitdrive_aspect

scenario:iastaspectsospath-ospathsplitext_aspect

scenario:iastaspectssplit-splitlines_aspect

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sabrenner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sabrenner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sabrenner left a comment

Choose a reason for hiding this comment

Uh oh!

brettlangdon Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Yun-Kim Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

brettlangdon Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Yun-Kim Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

quinna-h Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Yun-Kim Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

quinna-h left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Yun-Kim commented May 16, 2025 •

edited by sabrenner

Loading

github-actions bot commented May 16, 2025 •

edited

Loading

github-actions bot commented May 16, 2025 •

edited

Loading

pr-commenter bot commented May 16, 2025 •

edited

Loading