✨ Add OpenTelemetry GenAI auto-instrumentation for MLflow compatibility#114
Merged
pdettori merged 27 commits intokagenti:mainfrom Feb 12, 2026
Merged
✨ Add OpenTelemetry GenAI auto-instrumentation for MLflow compatibility#114pdettori merged 27 commits intokagenti:mainfrom
pdettori merged 27 commits intokagenti:mainfrom
Conversation
990dea6 to
3524675
Compare
9955083 to
54b5c4d
Compare
This adds opentelemetry-instrumentation-openai to emit spans with gen_ai.* attributes alongside the existing OpenInference spans. The OTEL Collector transform processor converts GenAI semantic convention spans to OpenInference format before sending to MLflow, enabling trace visibility in both Phoenix and MLflow observability backends. Changes: - pyproject.toml: Add opentelemetry-instrumentation-openai dependency - agent.py: Instrument OpenAI calls with GenAI semantic conventions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Adds instrumentation to propagate A2A context_id, task_id, and user input to OTEL spans. This enables: - Session/conversation grouping in MLflow UI - Trace filtering by context_id - Request/response visibility in trace details Signed-off-by: Ladislav Smola <lsmola@redhat.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Replace custom a2a.* attributes with standard GenAI semantic conventions: - gen_ai.conversation.id (for session tracking in MLflow) - gen_ai.agent.name, gen_ai.agent.id - gen_ai.request.model, gen_ai.system - gen_ai.prompt, gen_ai.completion This allows MLflow to automatically parse and display session/trace info. See: https://opentelemetry.io/docs/specs/semconv/gen-ai/ Signed-off-by: Ladislav Smola <lsmola@redhat.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Agents now emit only gen_ai.* attributes. The OTEL Collector transforms these to OpenInference (llm.*) for Phoenix and MLflow metadata (mlflow.trace.session) for session tracking. Signed-off-by: Ladislav Smola <lsmola@redhat.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Add input.value and output.value span attributes for MLflow UI columns. These are set alongside gen_ai.prompt/completion for dual compatibility with Phoenix (OpenInference) and MLflow. Signed-off-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Make gen_ai.agent.invoke a new root span instead of A2A child span. This allows MLflow UI to read Request/Response from the root span. - Create new trace context (no parent) for agent span - Add link to A2A parent span for debugging reference - Set mlflow.spanInputs/Outputs on root span directly Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
The previous approach using empty Context() didn't break the trace chain. Use trace.set_span_in_context(INVALID_SPAN) to force a new trace_id. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
The start_as_current_span approach wasn't breaking the chain. Use start_span() with explicit context management: - Create empty Context() for new trace - Manually attach/detach context for child spans - Properly end span in finally block Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Properly break trace chain by attaching an empty context before creating the span. This ensures the tracer sees no parent span. Also fixes span lifecycle with proper end() and context detach. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Port observability.py from PR kagenti#105 to provide clean abstraction for: - setup_observability() for one-time tracer initialization - create_agent_span() context manager for AGENT spans - W3C Trace Context propagation for distributed tracing Added OpenInference dependencies (semantic-conventions and instrumentation-langchain) to enable LangChain auto-instrumentation with AGENT span semantics for MLflow compatibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Replace ~100 lines of manual OTEL context manipulation in agent.py with the new create_agent_span() context manager. Update __init__.py to call setup_observability() for centralized tracer configuration. This simplifies agent code while maintaining: - LangChain/OpenAI auto-instrumentation - New root span creation (breaks A2A trace chain for MLflow UI) - GenAI/OpenInference semantic conventions - W3C trace context propagation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Change break_parent_chain default to False so our gen_ai.agent.invoke span becomes a proper child of A2A request spans. This enables: - Full distributed trace visibility in Phoenix and MLflow - Proper tree structure with A2A -> Agent -> LangChain hierarchy - All spans visible under a single trace instead of orphaned roots Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Instead of creating a child span, enrich_current_span() adds GenAI semantic conventions directly to the existing A2A span. This: - Keeps the A2A span as the trace root - Adds gen_ai.prompt, gen_ai.completion, etc. to the root span - Avoids extra nesting in the trace tree - Makes MLflow/Phoenix show GenAI attributes on the root span Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Set mlflow.spanInputs, mlflow.spanOutputs, mlflow.spanType, mlflow.traceName, mlflow.user, mlflow.source directly in the agent's observability.py. This removes the dependency on OTEL Collector transform/genai_to_mlflow processor. The transform can be re-enabled as a fallback for agents that don't set these attributes. Added helpers: - set_span_output(): Sets gen_ai.completion, output.value, mlflow.spanOutputs - set_token_usage(): Sets token count attributes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
When A2A SDK doesn't have an active span, get_current_span() returns a non-recording span and our attributes would be lost. Now enrich_current_span(): 1. Checks if current span is recording 2. If yes, enriches it with GenAI/MLflow attributes 3. If no, creates a new 'gen_ai.agent.invoke' span This ensures traces are always captured regardless of A2A SDK state. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Add Starlette middleware that creates a root span BEFORE A2A handlers: - Our gen_ai.agent.invoke span becomes the true trace root - A2A spans become children of our span - Full control over MLflow/GenAI attributes on root span Also add static resource attributes for MLflow: - mlflow.traceName, mlflow.source on TracerProvider Resource - gen_ai.agent.name, gen_ai.agent.version, gen_ai.system The middleware parses A2A JSON-RPC request/response to set: - mlflow.spanInputs from user message - mlflow.spanOutputs from agent response - gen_ai.conversation.id for session tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
The middleware now attaches an empty context before creating the span, ensuring our gen_ai.agent.invoke span is a true root span without inheriting parent from W3C Trace Context headers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Signed-off-by: Ladislav Smola <lsmola@redhat.com>
The middleware is now working correctly - creates a true root span by breaking the parent chain before creating gen_ai.agent.invoke span. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Adds span attributes for MLflow trace list columns: - mlflow.user / enduser.id - from auth header or "anonymous" - mlflow.traceName - agent name - mlflow.runName - agent invoke name - mlflow.source - service name - mlflow.version - agent version - mlflow.trace.session / gen_ai.conversation.id - from context_id or message_id Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
When reading the response body to extract output for MLflow attributes, we must always recreate the Response object since the body_iterator is consumed and cannot be read again. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
The tracing middleware cannot capture output from streaming responses. This fix calls set_span_output() in the agent's execute() method after extracting the final answer, which populates: - mlflow.spanOutputs (for MLflow Response column) - output.value (for Phoenix/OpenInference) - gen_ai.completion (for GenAI semantic conventions) Uses trace.get_current_span() to get the active span created by the tracing middleware. Fixes Task kagenti#8: Streaming response output capture. Signed-off-by: Ladislav Smola <lsmola@redhat.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
The previous approach used trace.get_current_span() in execute() to set mlflow.spanOutputs, but this returned the A2A span, not the middleware-created root span. MLflow reads attributes from the root span only, so the Response column was empty for streaming responses. Fix: - Add ContextVar _root_span_var to store middleware's root span - Add get_root_span() function to retrieve it in agent code - Update agent.py to use get_root_span() instead of trace.get_current_span() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Per OpenTelemetry GenAI Agent Spans specification:
- Span name: "invoke_agent {agent_name}" instead of "gen_ai.agent.invoke"
- SpanKind: INTERNAL for in-process agents (not SERVER)
- Add gen_ai.operation.name = "invoke_agent" (required)
- Add gen_ai.provider.name (required)
- Remove duplicate attribute settings
Ref: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Ladislav Smola <lsmola@redhat.com>
ghcr.io/astral-sh/uv requires authentication which fails on HyperShift clusters without GHCR pull secrets. Switch to python:3.12-slim-bookworm (Docker Hub, no auth needed) and install uv via pip. Affects: weather_service, weather_tool (E2E test agents) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>
ebe6e19 to
7249809
Compare
This was referenced Feb 12, 2026
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This adds opentelemetry-instrumentation-openai to emit spans with gen_ai.* attributes alongside the existing OpenInference spans.
The OTEL Collector transform processor converts GenAI semantic convention spans to OpenInference format before sending to MLflow, enabling trace visibility in both Phoenix and MLflow observability backends.
Changes:
Related issue(s)
kagenti/kagenti#569