The unified observability layer for the AI Control Plane.
AITrace provides the unified observability layer for the AI Control Plane, transforming opaque, non-deterministic AI processes into fully interpretable and debuggable execution traces.
Its mission is to create an Elixir-native instrumentation library and a corresponding data model that captures the complete causal chain of an AI agent's reasoning process—from initial prompt to final output, including all thoughts, tool calls, and state changes—enabling a true "Execution Cinema" experience for developers and operators.
Debugging a simple web request is a solved problem. We have structured logs, metrics, and distributed tracing (like OpenTelemetry) that show the path of a request through a series of stateless services.
Debugging an AI agent is fundamentally different. It is like performing forensic analysis on a dream. The challenges are unique:
- Non-Determinism: The same input can produce different outputs and, more importantly, different reasoning paths.
- Deeply Nested Causality: A final answer may be the result of a multi-step chain of thought, where an LLM hallucinates, calls the wrong tool with the wrong arguments, misinterprets the result, and then tries to correct itself.
- Stateful Complexity: Agents are not stateless. Their behavior is conditioned by memory, scratchpads, and the history of the conversation. A simple log line is insufficient to capture the state that led to a decision.
- Polyglot Execution: An agent's "thought" may happen in Elixir, but its "action" (e.g., running a code interpreter) happens in a sandboxed Python environment. Tracing this flow across language boundaries is notoriously difficult.
Logger.info/1 is inadequate. Traditional APM tools provide a high-level view but lack the granular, AI-specific context needed to answer the most important question: "Why did the agent do that?"
AITrace is built on a few simple but powerful concepts, heavily inspired by OpenTelemetry but adapted for AI workflows.
-
Trace: The complete, end-to-end record of a single transaction (e.g., one user message to an agent). It is identified by a unique
trace_id. A trace is composed of a rootSpanand many nestedSpansandEvents. -
Span: A record of a timed operation with a distinct start and end. A span represents a unit of work. Examples:
llm_call,tool_execution,prompt_rendering. Spans can be nested to represent a call graph. Each span has aname,start_time,end_time, and a key-value map ofattributes. -
Event: A point-in-time annotation within a
Span. It represents a notable occurrence that isn't a timed operation. Examples:agent_state_updated,validation_failed,tool_not_found. -
Context: An immutable Elixir map (
%AITrace.Context{}) that carries thetrace_idand the currentspan_id. This context is explicitly passed through the entire call stack of a traced operation, ensuring all telemetry is correctly correlated.
Add aitrace to your mix.exs dependencies:
def deps do
[
{:aitrace, "~> 0.1.0"}
]
enddefmodule MyApp.Agent do
require AITrace # Required to use the macros
def handle_user_message(message, state) do
# 1. Start a new trace for the entire transaction
AITrace.trace "agent.handle_message" do
# 2. Add point-in-time events with rich metadata
AITrace.add_event("request_received", %{message_length: String.length(message)})
# 3. Wrap discrete, timed operations in spans
response = AITrace.span "reasoning_loop" do
# Add attributes to the current span
AITrace.with_attributes(%{model: "gpt-4", temperature: 0.7})
# Perform reasoning
think_about(message)
end
AITrace.add_event("reasoning_complete", %{token_usage: response.tokens})
{:reply, response.answer, update_state(state)}
end
end
endAITrace.trace "operation_name" do
# Your code here - context is stored in process dictionary
endAITrace.span "span_name" do
# Timed operation - duration is automatically measured
endAITrace.add_event("event_name", %{key: "value"})
AITrace.add_event("simple_event") # No attributesAITrace.with_attributes(%{user_id: 42, region: "us-west"})ctx = AITrace.get_current_context()
IO.inspect(ctx.trace_id)
IO.inspect(ctx.span_id)Configure exporters in your application config:
# config/config.exs
config :aitrace,
exporters: [
{AITrace.Exporter.Console, verbose: true, color: true},
{AITrace.Exporter.File, directory: "./traces"}
]AITrace.Exporter.Console- Prints human-readable traces to stdout
- Options:
verbose(show attributes/events),color(ANSI colors)
AITrace.Exporter.File- Writes JSON traces to files
- Options:
directory(output directory, default: "./traces")
Implement the AITrace.Exporter behavior:
defmodule MyApp.CustomExporter do
@behaviour AITrace.Exporter
@impl true
def init(opts), do: {:ok, opts}
@impl true
def export(trace, state) do
# Send trace to your backend
IO.inspect(trace)
{:ok, state}
end
@impl true
def shutdown(_state), do: :ok
endSee examples/basic_usage.exs for a complete working example:
mix run examples/basic_usage.exsOutput:
Trace: b37b73325dbd626481e0ff3e89de02c8
▸ reasoning (10.84ms) ✓
Attributes: %{model: "gpt-4", temperature: 0.7}
• reasoning_complete
%{thought_count: 3}
▸ tool_execution (5.95ms) ✓
Attributes: %{tool: "web_search"}
▸ response_generation (8.98ms) ✓
Attributes: %{tokens: 150}
- AITrace.Context - Carries trace_id and span_id through the call stack
- AITrace.Span - Timed operations with start/end times, attributes, and events
- AITrace.Event - Point-in-time annotations within spans
- AITrace.Trace - Complete trace containing all spans
- AITrace.Collector - In-memory Agent storing active traces
- AITrace.Application - Supervision tree managing the collector
- Context stored in process dictionary for implicit propagation
AITrace is designed to integrate with other AI infrastructure:
- DSPex - Automatic instrumentation for LLM calls and prompt rendering
- Altar - Tool execution tracing with arguments and results
- Snakepit - Cross-language tracing via gRPC metadata
- Phoenix Channels - Real-time trace streaming to web UIs
- OpenTelemetry - Export to standard observability platforms
✅ Implemented (v0.1.0)
- Core data structures (Context, Span, Event, Trace)
- Trace and span macros with automatic timing
- Event and attribute APIs
- Console exporter (human-readable output)
- File exporter (JSON format)
- Comprehensive test suite (80 tests)
- Working examples
🚧 Planned
- Phoenix Channel exporter for real-time streaming
- OpenTelemetry exporter
- OTP integration helpers (GenServer, Oban)
- Cross-process context propagation
- "Execution Cinema" web UI with waterfall views
- DSPex, Altar, and Snakepit integrations
# Run all tests
mix test
# Run with coverage
mix test --cover
# Run example
mix run examples/basic_usage.exsMIT - See LICENSE for details.
AITrace is part of the AI Control Plane ecosystem. Contributions welcome!