feat: Add OTel Gen AI semantic convention input / output messages#1666
feat: Add OTel Gen AI semantic convention input / output messages#1666brightsparc wants to merge 27 commits intopydantic:mainfrom
Conversation
- Add TypedDicts in semconv.py (TextPart, ToolCallPart, ChatMessage, etc.) - Add convert_anthropic_messages_to_semconv() + _convert_anthropic_content_part() - Add convert_chat_completions_to_semconv() + helpers - Add convert_responses_inputs_to_semconv() - Set gen_ai.input.messages and gen_ai.system_instructions on spans Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add OutputMessage TypedDict in semconv.py - Add convert_anthropic_response_to_semconv() for Anthropic - Add convert_openai_response_to_semconv() for OpenAI chat completions - Add convert_responses_outputs_to_semconv() for OpenAI Responses API - Set gen_ai.output.messages on spans in on_response and stream state Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Anthropic tests still need manual updates due to inline-snapshot issues. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Combines both branches to add semantic convention attributes for: - Input messages (gen_ai.input.messages) - System instructions (gen_ai.system_instructions) - Output messages (gen_ai.output.messages) Resolves merge conflicts in: - semconv.py: Combined InputMessages, SystemInstructions, OutputMessage types - openai.py: Combined imports from both branches - anthropic.py: Kept both input and output conversion functions - test_anthropic.py: Combined expected snapshots with both input and output - test_anthropic_bedrock.py: Combined expected snapshots Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Adds OpenTelemetry Gen AI semantic convention message attributes to LLM provider instrumentation, including conversion of provider-specific message formats into standardized semconv messages/parts structures.
Changes:
- Introduces typed semconv definitions for message parts/messages and new attribute constants for input/output/system instructions.
- Updates OpenAI + Anthropic integrations to emit
gen_ai.input.messages,gen_ai.output.messages, andgen_ai.system_instructionswhere applicable. - Expands and updates OTel integration tests to validate semconv message conversion across text/tool/image and edge cases.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| logfire/_internal/integrations/llm_providers/semconv.py | Adds message attribute constants and TypedDict-based semconv message/part type definitions. |
| logfire/_internal/integrations/llm_providers/openai.py | Emits semconv input/output/system instruction attributes and adds conversion helpers for Chat Completions + Responses API. |
| logfire/_internal/integrations/llm_providers/anthropic.py | Emits semconv input/output/system instruction attributes and adds conversion helpers for Anthropic message formats. |
| tests/otel_integrations/test_openai.py | Updates snapshots and adds coverage for semconv conversion across tool calls/images/empty inputs/None finish_reason. |
| tests/otel_integrations/test_anthropic.py | Updates snapshots and adds new tests for missing system, missing content, and list/multi-part content conversion. |
| tests/otel_integrations/test_anthropic_bedrock.py | Updates mocked responses and snapshots to include semconv message attributes and finish reason. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if otel_span is not None and hasattr(otel_span, 'attributes') and otel_span.attributes: | ||
| events_attr = otel_span.attributes.get('events') | ||
| if isinstance(events_attr, list): # pragma: no cover | ||
| existing_events = cast(list[Any], events_attr) |
There was a problem hiding this comment.
In on_response for Response, the current logic tries to read existing events from otel_span.attributes only when it’s already a list. However LogfireSpan.set_attribute() serializes non-OTel types (like list-of-dicts) to a JSON str, so events_attr will typically be a string here. This causes existing_events to stay empty and overwrites the input-side events, breaking the “keep 'events' for backward compatibility” intent.
Parse the existing events attribute when it’s a JSON string (or read the pre-serialized value from span._otlp_attributes) and append responses_output_events(response) to that list before setting the attribute.
| existing_events = cast(list[Any], events_attr) | |
| existing_events = cast(list[Any], events_attr) | |
| elif isinstance(events_attr, str): | |
| # LogfireSpan.set_attribute() serializes non-OTel types (like list-of-dicts) to JSON strings. | |
| # Try to recover the original list from the JSON string if possible. | |
| with contextlib.suppress(Exception): | |
| parsed = json.loads(events_attr) | |
| if isinstance(parsed, list): | |
| existing_events = cast(list[Any], parsed) | |
| # If we still don't have existing events, try reading the pre-serialized value from _otlp_attributes. | |
| if not existing_events: | |
| otlp_attrs = getattr(span, '_otlp_attributes', None) | |
| if isinstance(otlp_attrs, dict): | |
| otlp_events = otlp_attrs.get('events') | |
| if isinstance(otlp_events, list): | |
| existing_events = cast(list[Any], otlp_events) |
There was a problem hiding this comment.
Made this changed and resolved merge conflicts.
The `events` attribute was serialized to a JSON string by `prepare_otlp_attribute`, so the `isinstance(events_attr, list)` check always failed and input events were silently overwritten by output events. Parse the JSON string back to a list when reading existing events. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The non-streaming Responses API tests now correctly show both input and output events in the backward-compat 'events' attribute. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolve conflicts between semconv message attributes (INPUT_MESSAGES, OUTPUT_MESSAGES, SYSTEM_INSTRUCTIONS) and upstream's new semconv attributes (INPUT_TOKENS, OUTPUT_TOKENS, RESPONSE_FINISH_REASONS, RESPONSE_ID, RESPONSE_MODEL). Both sets of attributes are now included. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
These paths handle message types (multimodal list content, tool calls in input messages, tool responses, named messages) that are not exercised by the existing VCR-based integration tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@alexmojaki ready for final review from you. |
OpenAI's input_audio content part nests data under input_audio.data, not at the top level. Also narrow the type check to just 'input_audio' since that's the actual OpenAI type. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| if isinstance(tc, ChatCompletionMessageFunctionToolCall): # pragma: no cover | ||
| func_args: Any = tc.function.arguments | ||
| if isinstance(func_args, str): | ||
| with contextlib.suppress(json.JSONDecodeError): | ||
| func_args = json.loads(func_args) | ||
| parts.append( | ||
| ToolCallPart( | ||
| type='tool_call', | ||
| id=tc.id, | ||
| name=tc.function.name, | ||
| arguments=func_args, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
🔴 Tool calls in OpenAI responses are silently dropped due to incorrect type check
The convert_openai_response_to_semconv function checks if tool calls are instances of ChatCompletionMessageFunctionToolCall, but standard OpenAI chat completion responses return ChatCompletionMessageToolCall objects in message.tool_calls.
Click to expand
Issue Details
At lines 402-417, the code iterates over message.tool_calls and checks:
if isinstance(tc, ChatCompletionMessageFunctionToolCall): # pragma: no coverHowever, ChatCompletionMessageFunctionToolCall is a specialized type used for parsed/structured output scenarios, not the standard ChatCompletionMessageToolCall type that is actually returned in message.tool_calls for regular chat completions with tool use.
Impact
When an OpenAI assistant response includes tool calls, the isinstance check fails silently (the inner block is never executed), and the tool calls are not included in the gen_ai.output.messages semantic convention attribute. This results in incomplete telemetry data where tool call information is lost from the output messages.
Expected vs Actual
- Expected: Tool calls should be included in the output message parts
- Actual: Tool calls are silently skipped because the type check fails
Recommendation: Check for ChatCompletionMessageToolCall instead of ChatCompletionMessageFunctionToolCall, or remove the isinstance check entirely since all tool calls in message.tool_calls should have the expected structure with id, function.name, and function.arguments attributes.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
The Devin review is incorrect. ChatCompletionMessageToolCall and ChatCompletionMessageFunctionToolCall are the same class — the SDK aliases them:
>>> from openai.types.chat.chat_completion_message_function_tool_call import ChatCompletionMessageFunctionToolCall
>>> from openai.types.chat import ChatCompletionMessageToolCall
>>> ChatCompletionMessageFunctionToolCall is ChatCompletionMessageToolCall
TrueThe tool_calls field on ChatCompletionMessage is typed as list[ChatCompletionMessageFunctionToolCall | ChatCompletionMessageCustomToolCall] | None, so the isinstance check exists to filter out the CustomToolCall variant, which is intentional.
Adds OpenTelemetry Gen AI Semantic Convention message attributes including:
gen_ai.system_instructionsgen_ai.input.messagesgen_ai.output.messagesConverts message formats (text, tool_use, tool_result, images) to the standardized semconv message part format.