Skip to content

Commit 821a80f

Browse files
committed
fix(llmobs): ensure langchain azure openai spans are not duplicate llm marked (#14939)
[MLOB-4230] This PR does 3 things: 1. (non-user facing) Updates our docker-compose and services.yml files to upgrade to the latest testagent version, as well as adding a env var `VCR_PROVIDER_MAP` value for the testagent configs. 2. (user-facing) fixes the langchain integration such that azure openai calls are not marked as duplicate LLM spans (if the openai integration is enabled), and instead marks them as generic workflow spans. 3. (non-user facing) Adds langchain tests for calling Azure OpenAI. These requires the testagent upgrade and the `VCR_PROVIDER_MAP` env var to allow the testagent vcr proxy to call the azure openai endpoint. We have logic in our langchain integration to mark specific LLM calls as generic workflow spans (instead of the default llm span) if we detect the corresponding integration (for the given provider, i.e. `openai/anthropic`) is also enabled and will result in a downstream LLM span. Our product experience breaks if multiple spans duplicate represent an LLM call, and we were previously missing support for azure openai. <!-- Provide an overview of the change and motivation for the change --> <!-- Describe your testing strategy or note what tests are included --> <!-- Note any risks associated with this change, or "None" if no risks --> <!-- Any other information that would be helpful for reviewers --> [MLOB-4230]: https://datadoghq.atlassian.net/browse/MLOB-4230?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ (cherry picked from commit 9f7d187)
1 parent de9dbc1 commit 821a80f

7 files changed

+261
-3
lines changed

.gitlab/services.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
DD_REMOTE_CONFIGURATION_REFRESH_INTERVAL: 5s
1313
DD_DOGSTATSD_NON_LOCAL_TRAFFIC: true
1414
testagent:
15-
name: registry.ddbuild.io/images/mirror/dd-apm-test-agent/ddapm-test-agent:v1.34.0
15+
name: registry.ddbuild.io/images/mirror/dd-apm-test-agent/ddapm-test-agent:v1.36.0
1616
alias: testagent
1717
variables:
1818
LOG_LEVEL: ERROR
@@ -26,6 +26,7 @@
2626
DD_DISABLE_ERROR_RESPONSES: true
2727
ENABLED_CHECKS: trace_content_length,trace_stall,meta_tracer_version_header,trace_count_header,trace_peer_service,trace_dd_service
2828
SNAPSHOT_IGNORED_ATTRS: span_id,trace_id,parent_id,duration,start,metrics.system.pid,metrics.system.process_id,metrics.process_id,meta.runtime-id,meta._dd.p.tid,meta.pathway.hash,metrics._dd.tracer_kr,meta._dd.parent_id,meta.kafka.cluster_id
29+
VCR_PROVIDER_MAP: azure_openai=https://llmobs-test-resource.openai.azure.com/
2930
mongo:
3031
name: registry.ddbuild.io/images/mirror/mongo:6.0.5
3132
alias: mongo

ddtrace/llmobs/_integrations/langchain.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@
6262
ANTHROPIC_PROVIDER_NAME = "anthropic"
6363
BEDROCK_PROVIDER_NAME = "amazon_bedrock"
6464
OPENAI_PROVIDER_NAME = "openai"
65+
AZURE_OAI_PROVIDER_NAME = "azure"
6566
VERTEXAI_PROVIDER_NAME = "vertexai"
6667
GEMINI_PROVIDER_NAME = "google_palm"
6768

@@ -189,7 +190,7 @@ def _llmobs_set_tags(
189190
# only the llm interface for Gemini will get instrumented
190191
elif model_provider.startswith(GEMINI_PROVIDER_NAME) and operation == "llm":
191192
llmobs_integration = "google_generativeai"
192-
elif model_provider.startswith(OPENAI_PROVIDER_NAME):
193+
elif any(provider in model_provider for provider in (OPENAI_PROVIDER_NAME, AZURE_OAI_PROVIDER_NAME)):
193194
llmobs_integration = "openai"
194195
elif operation == "chat" and model_provider.startswith(ANTHROPIC_PROVIDER_NAME):
195196
llmobs_integration = "anthropic"

docker-compose.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ services:
121121
volumes:
122122
- ddagent:/tmp/ddagent:rw
123123
testagent:
124-
image: ghcr.io/datadog/dd-apm-test-agent/ddapm-test-agent:v1.34.0
124+
image: ghcr.io/datadog/dd-apm-test-agent/ddapm-test-agent:v1.36.0
125125
ports:
126126
- "127.0.0.1:9126:8126"
127127
volumes:
@@ -131,6 +131,7 @@ services:
131131
- LOG_LEVEL=WARNING
132132
- SNAPSHOT_DIR=/snapshots
133133
- VCR_CASSETTES_DIRECTORY=/cassettes
134+
- VCR_PROVIDER_MAP=azure_openai=https://llmobs-test-resource.openai.azure.com/
134135
- SNAPSHOT_CI=0
135136
- DD_POOL_TRACE_CHECK_FAILURES=true
136137
- DD_DISABLE_ERROR_RESPONSES=true
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
fixes:
3+
- |
4+
LLM Observability: Resolves an issue where the ``langchain`` integration would incorrectly mark Azure OpenAI calls as duplicate llm operations even if the ``openai`` integration was enabled.
5+
The ``langchain`` integration will trace Azure OpenAI spans as workflow spans if there is an equivalent llm span from the ``openai`` integration.

tests/contrib/langchain/test_langchain_llmobs.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -847,6 +847,11 @@ class TestTraceStructureWithLLMIntegrations(SubprocessTestCase):
847847
DD_API_KEY="<not-a-real-key>",
848848
)
849849

850+
azure_openai_env_config = dict(
851+
OPENAI_API_VERSION="2024-12-01-preview",
852+
AZURE_OPENAI_API_KEY=os.getenv("AZURE_OPENAI_API_KEY", "testing"),
853+
)
854+
850855
anthropic_env_config = dict(
851856
ANTHROPIC_API_KEY=os.getenv("ANTHROPIC_API_KEY", "testing"),
852857
DD_API_KEY="<not-a-real-key>",
@@ -891,6 +896,11 @@ def _call_openai_llm(OpenAI):
891896
llm = OpenAI(base_url="http://localhost:9126/vcr/openai")
892897
llm.invoke("Can you explain what Descartes meant by 'I think, therefore I am'?")
893898

899+
@staticmethod
900+
def _call_azure_openai_chat(AzureChatOpenAI):
901+
llm = AzureChatOpenAI(azure_endpoint="http://localhost:9126/vcr/azure_openai", deployment_name="gpt-4.1-mini")
902+
llm.invoke("Can you explain what Descartes meant by 'I think, therefore I am'?")
903+
894904
@staticmethod
895905
def _call_openai_embedding(OpenAIEmbeddings):
896906
embedding = OpenAIEmbeddings(base_url="http://localhost:9126/vcr/openai")
@@ -924,6 +934,15 @@ def test_llmobs_with_openai_enabled(self):
924934
self._call_openai_llm(OpenAI)
925935
self._assert_trace_structure_from_writer_call_args(["workflow", "llm"])
926936

937+
@run_in_subprocess(env_overrides=azure_openai_env_config)
938+
def test_llmobs_with_openai_enabled_azure(self):
939+
from langchain_openai import AzureChatOpenAI
940+
941+
patch(langchain=True, openai=True)
942+
LLMObs.enable(ml_app="<ml-app-name>", integrations_enabled=False)
943+
self._call_azure_openai_chat(AzureChatOpenAI)
944+
self._assert_trace_structure_from_writer_call_args(["workflow", "llm"])
945+
927946
@run_in_subprocess(env_overrides=openai_env_config)
928947
def test_llmobs_with_openai_enabled_non_ascii_value(self):
929948
"""Regression test to ensure that non-ascii text values for workflow spans are not encoded."""
@@ -966,6 +985,16 @@ def test_llmobs_with_openai_disabled(self):
966985
self._call_openai_llm(OpenAI)
967986
self._assert_trace_structure_from_writer_call_args(["llm"])
968987

988+
@run_in_subprocess(env_overrides=azure_openai_env_config)
989+
def test_llmobs_with_openai_disabled_azure(self):
990+
from langchain_openai import AzureChatOpenAI
991+
992+
patch(langchain=True)
993+
994+
LLMObs.enable(ml_app="<ml-app-name>", integrations_enabled=False)
995+
self._call_azure_openai_chat(AzureChatOpenAI)
996+
self._assert_trace_structure_from_writer_call_args(["llm"])
997+
969998
@run_in_subprocess(env_overrides=anthropic_env_config)
970999
def test_llmobs_with_anthropic_enabled(self):
9711000
from langchain_anthropic import ChatAnthropic
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
interactions:
2+
- request:
3+
body: '{"messages":[{"content":"Can you explain what Descartes meant by ''I think,
4+
therefore I am''?","role":"user"}],"model":"gpt-3.5-turbo","n":1,"stream":false,"temperature":0.7}'
5+
headers:
6+
? !!python/object/apply:multidict._multidict.istr
7+
- Accept
8+
: - application/json
9+
? !!python/object/apply:multidict._multidict.istr
10+
- Accept-Encoding
11+
: - gzip, deflate
12+
? !!python/object/apply:multidict._multidict.istr
13+
- Connection
14+
: - keep-alive
15+
Content-Length:
16+
- '172'
17+
? !!python/object/apply:multidict._multidict.istr
18+
- Content-Type
19+
: - application/json
20+
? !!python/object/apply:multidict._multidict.istr
21+
- User-Agent
22+
: - AzureOpenAI/Python 1.109.1
23+
? !!python/object/apply:multidict._multidict.istr
24+
- X-Stainless-Arch
25+
: - arm64
26+
? !!python/object/apply:multidict._multidict.istr
27+
- X-Stainless-Async
28+
: - 'false'
29+
? !!python/object/apply:multidict._multidict.istr
30+
- X-Stainless-Lang
31+
: - python
32+
? !!python/object/apply:multidict._multidict.istr
33+
- X-Stainless-OS
34+
: - MacOS
35+
? !!python/object/apply:multidict._multidict.istr
36+
- X-Stainless-Package-Version
37+
: - 1.109.1
38+
? !!python/object/apply:multidict._multidict.istr
39+
- X-Stainless-Runtime
40+
: - CPython
41+
? !!python/object/apply:multidict._multidict.istr
42+
- X-Stainless-Runtime-Version
43+
: - 3.11.13
44+
? !!python/object/apply:multidict._multidict.istr
45+
- x-stainless-retry-count
46+
: - '0'
47+
method: POST
48+
uri: https://llmobs-test-resource.openai.azure.com/openai/deployments/gpt-4.1-mini/chat/completions?api-version=2024-12-01-preview
49+
response:
50+
body:
51+
string: "{\"choices\":[{\"content_filter_results\":{\"hate\":{\"filtered\":false,\"severity\":\"safe\"},\"protected_material_code\":{\"filtered\":false,\"detected\":false},\"protected_material_text\":{\"filtered\":false,\"detected\":false},\"self_harm\":{\"filtered\":false,\"severity\":\"safe\"},\"sexual\":{\"filtered\":false,\"severity\":\"safe\"},\"violence\":{\"filtered\":false,\"severity\":\"safe\"}},\"finish_reason\":\"stop\",\"index\":0,\"logprobs\":null,\"message\":{\"annotations\":[],\"content\":\"Certainly!
52+
The phrase **\\\"I think, therefore I am\\\"** (originally in Latin: *Cogito,
53+
ergo sum*) was coined by the French philosopher Ren\xE9 Descartes. It appears
54+
in his work *Discourse on the Method* (1637) and later in *Meditations on
55+
First Philosophy* (1641).\\n\\n### What Descartes Meant:\\n\\n1. **Foundation
56+
of Certainty:** \\n Descartes was searching for an undeniable foundation
57+
for knowledge. He wanted to find something that could not be doubted, as many
58+
beliefs could be mistaken.\\n\\n2. **Method of Doubt:** \\n He began by
59+
doubting everything \u2014 the evidence of the senses, the existence of the
60+
physical world, even mathematical truths \u2014 to see if anything remained
61+
absolutely certain.\\n\\n3. **The Indubitable Truth:** \\n While doubting,
62+
Descartes realized that the very act of doubting implied a thinking subject.
63+
If he is doubting or thinking, then he must exist in some form to be doing
64+
that thinking.\\n\\n4. **\\\"I think, therefore I am\\\":** \\n Therefore,
65+
the one thing he could not doubt was that he exists as a thinking being. The
66+
act of thinking itself proved his own existence. This statement became the
67+
first principle in his philosophy.\\n\\n### In summary:\\n\\nDescartes meant
68+
that the fact that you are consciously thinking is proof of your own existence.
69+
Even if everything else is uncertain or illusory, the very experience of thought
70+
confirms that there is a \\\"self\\\" doing the thinking. It\u2019s a foundational
71+
claim about knowledge and existence.\\n\\nIf you want, I can also explain
72+
how this idea influenced philosophy or its criticisms!\",\"refusal\":null,\"role\":\"assistant\"}}],\"created\":1760724000,\"id\":\"chatcmpl-CRj28rdRKqhnTBkcKCmXlz0vReldy\",\"model\":\"gpt-4.1-mini-2025-04-14\",\"object\":\"chat.completion\",\"prompt_filter_results\":[{\"prompt_index\":0,\"content_filter_results\":{\"hate\":{\"filtered\":false,\"severity\":\"safe\"},\"jailbreak\":{\"filtered\":false,\"detected\":false},\"self_harm\":{\"filtered\":false,\"severity\":\"safe\"},\"sexual\":{\"filtered\":false,\"severity\":\"safe\"},\"violence\":{\"filtered\":false,\"severity\":\"safe\"}}}],\"system_fingerprint\":\"fp_3dcd5944f5\",\"usage\":{\"completion_tokens\":342,\"completion_tokens_details\":{\"accepted_prediction_tokens\":0,\"audio_tokens\":0,\"reasoning_tokens\":0,\"rejected_prediction_tokens\":0},\"prompt_tokens\":24,\"prompt_tokens_details\":{\"audio_tokens\":0,\"cached_tokens\":0},\"total_tokens\":366}}\n"
73+
headers:
74+
Content-Length:
75+
- '2780'
76+
Content-Type:
77+
- application/json
78+
Date:
79+
- Fri, 17 Oct 2025 18:00:13 GMT
80+
Strict-Transport-Security:
81+
- max-age=31536000; includeSubDomains; preload
82+
apim-request-id:
83+
- 381a6525-4f15-481a-8884-dd8ec1b0d6fc
84+
azureml-model-session:
85+
- d213-20251016082839
86+
x-accel-buffering:
87+
- 'no'
88+
x-content-type-options:
89+
- nosniff
90+
x-ms-deployment-name:
91+
- gpt-4.1-mini
92+
x-ms-rai-invoked:
93+
- 'true'
94+
x-ms-region:
95+
- East US 2
96+
x-ratelimit-limit-requests:
97+
- '250'
98+
x-ratelimit-limit-tokens:
99+
- '250000'
100+
x-ratelimit-remaining-requests:
101+
- '248'
102+
x-ratelimit-remaining-tokens:
103+
- '249979'
104+
x-request-id:
105+
- 8e52694a-22a0-42fa-9e2d-54539ebd2113
106+
status:
107+
code: 200
108+
message: OK
109+
version: 1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
interactions:
2+
- request:
3+
body: '{"messages":[{"content":"Can you explain what Descartes meant by ''I think,
4+
therefore I am''?","role":"user"}],"model":null,"stream":false}'
5+
headers:
6+
? !!python/object/apply:multidict._multidict.istr
7+
- Accept
8+
: - application/json
9+
? !!python/object/apply:multidict._multidict.istr
10+
- Accept-Encoding
11+
: - gzip, deflate, zstd
12+
? !!python/object/apply:multidict._multidict.istr
13+
- Connection
14+
: - keep-alive
15+
Content-Length:
16+
- '137'
17+
? !!python/object/apply:multidict._multidict.istr
18+
- Content-Type
19+
: - application/json
20+
? !!python/object/apply:multidict._multidict.istr
21+
- User-Agent
22+
: - langchain-partner-python-azure-openai
23+
? !!python/object/apply:multidict._multidict.istr
24+
- X-Stainless-Arch
25+
: - arm64
26+
? !!python/object/apply:multidict._multidict.istr
27+
- X-Stainless-Async
28+
: - 'false'
29+
? !!python/object/apply:multidict._multidict.istr
30+
- X-Stainless-Lang
31+
: - python
32+
? !!python/object/apply:multidict._multidict.istr
33+
- X-Stainless-OS
34+
: - MacOS
35+
? !!python/object/apply:multidict._multidict.istr
36+
- X-Stainless-Package-Version
37+
: - 1.109.1
38+
? !!python/object/apply:multidict._multidict.istr
39+
- X-Stainless-Raw-Response
40+
: - 'true'
41+
? !!python/object/apply:multidict._multidict.istr
42+
- X-Stainless-Runtime
43+
: - CPython
44+
? !!python/object/apply:multidict._multidict.istr
45+
- X-Stainless-Runtime-Version
46+
: - 3.11.13
47+
? !!python/object/apply:multidict._multidict.istr
48+
- x-stainless-retry-count
49+
: - '0'
50+
method: POST
51+
uri: https://llmobs-test-resource.openai.azure.com/openai/deployments/gpt-4.1-mini/chat/completions?api-version=2024-12-01-preview
52+
response:
53+
body:
54+
string: "{\"choices\":[{\"content_filter_results\":{\"hate\":{\"filtered\":false,\"severity\":\"safe\"},\"protected_material_code\":{\"filtered\":false,\"detected\":false},\"protected_material_text\":{\"filtered\":false,\"detected\":false},\"self_harm\":{\"filtered\":false,\"severity\":\"safe\"},\"sexual\":{\"filtered\":false,\"severity\":\"safe\"},\"violence\":{\"filtered\":false,\"severity\":\"safe\"}},\"finish_reason\":\"stop\",\"index\":0,\"logprobs\":null,\"message\":{\"annotations\":[],\"content\":\"Certainly!
55+
The phrase **\\\"I think, therefore I am\\\"** (originally in Latin: *Cogito,
56+
ergo sum*) was coined by the French philosopher Ren\xE9 Descartes. It is a
57+
fundamental element of Western philosophy and appears in his work *Meditations
58+
on First Philosophy* (1641).\\n\\n**What Descartes meant:**\\n\\n1. **Starting
59+
point of certainty:** Descartes was seeking an indubitable foundation for
60+
knowledge. He embarked on a method of radical doubt, questioning everything
61+
that could possibly be doubted\u2014his senses, the physical world, even mathematical
62+
truths.\\n\\n2. **The act of thinking proves existence:** In the process of
63+
doubting, he realized that the very act of doubting or thinking implies a
64+
thinker. If he is doubting, then he must be thinking. If he is thinking, then
65+
he must exist. Thus, the fact that he thinks is proof that he exists.\\n\\n3.
66+
**Existence is confirmed through self-awareness:** This statement establishes
67+
the self as a thinking thing (*res cogitans*). Descartes is not saying \\\"I
68+
am a body,\\\" or \\\"I exist in the physical world,\\\" but rather affirming
69+
the existence of the self as a conscious being\u2014one that thinks, doubts,
70+
understands, wills, imagines, and senses.\\n\\n4. **Foundation for knowledge:**
71+
From this fundamental truth, Descartes hoped to build further knowledge about
72+
the world, God, and existence, by basing it on something certain and clear:
73+
the existence of the self as a thinking entity.\\n\\nIn summary, **\\\"I think,
74+
therefore I am\\\" means that the very act of thinking is proof enough of
75+
one's existence and is the first principle of philosophy that cannot be doubted.**\",\"refusal\":null,\"role\":\"assistant\"}}],\"created\":1760724017,\"id\":\"chatcmpl-CRj2PkKuSFXMkYWzr4xxvmv5TtlQ9\",\"model\":\"gpt-4.1-mini-2025-04-14\",\"object\":\"chat.completion\",\"prompt_filter_results\":[{\"prompt_index\":0,\"content_filter_results\":{\"hate\":{\"filtered\":false,\"severity\":\"safe\"},\"jailbreak\":{\"filtered\":false,\"detected\":false},\"self_harm\":{\"filtered\":false,\"severity\":\"safe\"},\"sexual\":{\"filtered\":false,\"severity\":\"safe\"},\"violence\":{\"filtered\":false,\"severity\":\"safe\"}}}],\"system_fingerprint\":\"fp_3dcd5944f5\",\"usage\":{\"completion_tokens\":354,\"completion_tokens_details\":{\"accepted_prediction_tokens\":0,\"audio_tokens\":0,\"reasoning_tokens\":0,\"rejected_prediction_tokens\":0},\"prompt_tokens\":24,\"prompt_tokens_details\":{\"audio_tokens\":0,\"cached_tokens\":0},\"total_tokens\":378}}\n"
76+
headers:
77+
Content-Length:
78+
- '2822'
79+
Content-Type:
80+
- application/json
81+
Date:
82+
- Fri, 17 Oct 2025 18:00:28 GMT
83+
Strict-Transport-Security:
84+
- max-age=31536000; includeSubDomains; preload
85+
apim-request-id:
86+
- ea05cb68-e4e4-4876-b0de-cca3a2bf4c63
87+
azureml-model-session:
88+
- d213-20251016082839
89+
x-accel-buffering:
90+
- 'no'
91+
x-content-type-options:
92+
- nosniff
93+
x-ms-deployment-name:
94+
- gpt-4.1-mini
95+
x-ms-rai-invoked:
96+
- 'true'
97+
x-ms-region:
98+
- East US 2
99+
x-ratelimit-limit-requests:
100+
- '250'
101+
x-ratelimit-limit-tokens:
102+
- '250000'
103+
x-ratelimit-remaining-requests:
104+
- '247'
105+
x-ratelimit-remaining-tokens:
106+
- '249962'
107+
x-request-id:
108+
- 33717a5b-b7e0-48d3-9347-0d8df2e5a48d
109+
status:
110+
code: 200
111+
message: OK
112+
version: 1

0 commit comments

Comments
 (0)