Skip to content

Conversation

@marklysze
Copy link
Collaborator

@marklysze marklysze commented Jan 1, 2026

Why are these changes needed?

This WIP PR introduces OpenTelemetry-based distributed tracing for AG2 multi-agent conversations. It enables observability into agent workflows, LLM calls, tool executions, and human-in-the-loop interactions.

Approach

OpenTelemetry GenAI Semantic Conventions

The implementation follows the OpenTelemetry GenAI Semantic Conventions with AG2-specific extensions. This ensures compatibility with standard observability tools (Grafana, Jaeger, Honeycomb, etc.) while capturing AG2-specific context.

Trace Hierarchy

initiate_chats (multi-chat workflow)
  └── conversation (initiate_chat / a_initiate_chat)
        ├── invoke_agent (generate_reply / a_generate_reply)
        │     ├── chat (LLM API call)
        │     ├── execute_tool (execute_function)
        │     ├── execute_code (code execution)
        │     └── speaker_selection (group chat)
        │           └── invoke_agent (internal)
        │                 └── chat (LLM API call)
        └── await_human_input (get_human_input)

Instrumentation Points

Span Type Operation What's Traced
conversation conversation initiate_chat, a_initiate_chat, run_chat, a_run_chat
agent invoke_agent generate_reply, a_generate_reply, remote A2A calls
llm chat All LLM API calls via OpenAIWrapper.create()
tool execute_tool execute_function, a_execute_function
speaker_selection speaker_selection Group chat speaker selection
human_input await_human_input Human-in-the-loop wait time
code_execution execute_code Code block execution
multi_conversation initiate_chats Sequential/parallel multi-chat workflows

Central LLM Instrumentation

All LLM providers (OpenAI, Anthropic, Gemini, Bedrock, Mistral, etc.) are instrumented through a single point: OpenAIWrapper.create(). This captures:

  • Provider and model names
  • Token usage (input/output)
  • Request parameters (temperature, max_tokens, etc.)
  • Response metadata (finish reasons, cost)
  • Optional input/output message capture

Distributed Tracing (A2A)

For remote agents using the A2A protocol, trace context is automatically propagated via W3C Trace Context headers, enabling end-to-end traces across service boundaries.

Current API (WIP)

from autogen.instrumentation import (
    setup_instrumentation,
    instrument_agent,
    instrument_llm_wrapper,
    instrument_pattern,
    instrument_chats,
    instrument_a2a_server,
)

# 1. Setup tracer
tracer = setup_instrumentation("my-service", "http://localhost:4317")

# 2. Instrument LLM calls (global, once)
instrument_llm_wrapper(tracer)

# 3. Instrument agents
instrument_agent(my_agent, tracer)

# 4. For group chats, instrument the pattern (auto-instruments all agents)
instrument_pattern(pattern, tracer)

# 5. For multi-chat workflows
instrument_chats(tracer)

# 6. For A2A remote agents
instrument_a2a_server(server, tracer)

Standard Attributes (OTEL GenAI)

  • gen_ai.operation.name - Operation type
  • gen_ai.agent.name - Agent name
  • gen_ai.provider.name - LLM provider
  • gen_ai.request.model / gen_ai.response.model
  • gen_ai.usage.input_tokens / gen_ai.usage.output_tokens
  • gen_ai.tool.name, gen_ai.tool.call.id, gen_ai.tool.call.arguments
  • gen_ai.input.messages / gen_ai.output.messages

AG2-Specific Extensions

  • ag2.span.type - Span classification
  • ag2.speaker_selection.candidates / ag2.speaker_selection.selected
  • ag2.human_input.prompt / ag2.human_input.response
  • ag2.code_execution.exit_code / ag2.code_execution.output
  • ag2.chats.count, ag2.chats.mode, ag2.chats.recipients
  • gen_ai.usage.cost - AG2 cost tracking
  • gen_ai.conversation.id / gen_ai.conversation.turns

Files

Note: This draft PR contains temporary files under the root tracing folder that you can use to test out the implementation.

File Purpose
autogen/instrumentation.py Core instrumentation functions
autogen/tracing/utils.py Helper functions (message conversion, attribute extraction)
tracing/TRACING.md Developer documentation
tracing/OTEL_GENAI_CONVENTION_AG2.md Attribute reference
tracing/agents/*.py Example scripts / playground
tracing/docker-compose.yaml Local Tempo + Grafana stack

Local Testing

cd tracing
docker-compose up -d          # Start Tempo + Grafana
python -m tracing.agents.local_agents  # Run example

# View traces at http://localhost:3333 (Grafana)

Status

DRAFT - Open for feedback! Please feel free to provide suggestions for the approach, in particular the API design.

Tracing examples

Screenshot 2026-01-02 at 4 56 09 pm Screenshot 2026-01-02 at 4 57 26 pm

Related issue number

N/A

Checks

@joggrbot

This comment has been minimized.

@marklysze marklysze changed the title feat: Instrumentation feat: Tracing and Instrumentation Jan 2, 2026
@codecov
Copy link

codecov bot commented Jan 2, 2026

Codecov Report

❌ Patch coverage is 12.24165% with 552 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
autogen/instrumentation.py 10.98% 478 Missing ⚠️
autogen/tracing/utils.py 15.47% 71 Missing ⚠️
autogen/a2a/server.py 62.50% 2 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
autogen/a2a/server.py 93.22% <62.50%> (-4.90%) ⬇️
autogen/tracing/utils.py 15.47% <15.47%> (ø)
autogen/instrumentation.py 10.98% <10.98%> (ø)

... and 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants