Skip to content

feat: add otel traceability#330

Merged
nicoloboschi merged 14 commits intomainfrom
traceability
Feb 10, 2026
Merged

feat: add otel traceability#330
nicoloboschi merged 14 commits intomainfrom
traceability

Conversation

@nicoloboschi
Copy link
Collaborator

No description provided.

- Add tool execution spans for reflect operations
- Add tool call information (names, params) to spans
- Change verification scope from 'test' to 'verification'
- Add hindsight.reflect_generation span for done() processing
- Implement no-op tracer for improved code readability
- Update documentation for OTEL configuration
- Resolve merge conflicts from rebase
- Add _serialize_for_span() helper to handle Pydantic models
- Update all providers to use the helper function
- Fixes test failures with 'Object of type X is not JSON serializable'
Add Grafana LGTM (Loki, Grafana, Tempo, Mimir) as the recommended
local development observability stack. This provides traces, metrics,
and logs in a single Docker container instead of separate tools.

Changes:
- Add scripts/dev/grafana/ with docker-compose and README
- Add scripts/dev/start-grafana.sh startup script
- Update .env.example to reference Grafana LGTM
- Update configuration docs to emphasize Grafana LGTM as primary option
- Reorder OTLP backend list to show Grafana LGTM first

Benefits:
- Single container vs multiple separate tools (Jaeger, SigNoz, etc.)
- ~515MB image with full observability stack
- Compatible with existing OTLP configuration
- Simpler local development setup
Remove SigNoz observability stack in favor of Grafana LGTM as the
sole recommended local development tracing solution.

Changes:
- Delete scripts/dev/signoz/ directory and all SigNoz configurations
- Delete scripts/dev/start-signoz.sh startup script
- Remove SigNoz references from .env.example
- Remove SigNoz from OTLP backends list in configuration docs

Grafana LGTM provides the same capabilities (traces, metrics, logs)
in a simpler single-container setup.
Add parent-child span structure for consolidation operations:
- hindsight.consolidation: Parent span for each memory being processed
- hindsight.consolidation_recall: Child span for finding related observations
- LLM call span: Automatically created by LLM provider (scope="consolidation")

This enables detailed timing breakdown in Grafana Tempo:
- Total consolidation time per memory
- Time spent in recall
- Time spent in LLM call
- Time spent executing actions (create/update)

All consolidation tests pass (31/31).
Add comprehensive metrics and dashboarding to the Grafana LGTM stack:

Metrics Collection:
- Configure Prometheus to scrape Hindsight API /metrics endpoint
- Scrape interval: 10 seconds
- Targets hindsight-api on host.docker.internal:8888

GenAI Dashboard:
- Pre-configured dashboard with 6 panels:
  - LLM call rate (by provider/model)
  - LLM call duration (p50/p95 by scope)
  - Token usage - input tokens/sec by scope
  - Token usage - output tokens/sec by scope
  - Operations rate (retain/recall/reflect/consolidation)
  - Operation duration p95 by operation type

Configuration:
- Mount prometheus.yml for metrics scraping
- Mount dashboards directory for auto-provisioning
- Add host.docker.internal mapping for container->host access
- Dashboard provisioning with auto-reload every 10s

Documentation:
- Updated README with metrics viewing instructions
- Added PromQL query examples
- Documented dashboard access and navigation

This provides full observability: traces (Tempo) + metrics (Prometheus/Mimir) + dashboards (Grafana)
Consolidate the separate scripts/dev/grafana/ setup into the existing
scripts/dev/monitoring/ stack, using Grafana LGTM (Loki, Grafana, Tempo, Mimir).

Changes:
- Remove separate scripts/dev/grafana/ directory and start-grafana.sh
- Rewrite scripts/dev/monitoring/start.sh to use Docker + Grafana LGTM
  (was: download native Prometheus/Grafana binaries)
- Add docker-compose.yaml for Grafana LGTM container
- Add prometheus.yml for scraping Hindsight API metrics
- Mount existing dashboards from monitoring/grafana/dashboards/
- Add comprehensive README.md

Benefits:
- Single unified monitoring command: ./scripts/dev/start-monitoring.sh
- Uses existing dashboard files (hindsight-operations, hindsight-llm, hindsight-api-service)
- Simpler setup: Docker-based vs downloading/running native binaries
- Full observability: traces + metrics + logs + dashboards in one container
- Standard ports: Grafana on 3000, OTLP on 4317/4318

Architecture:
- Grafana LGTM container (~515MB) provides all components
- Dashboards auto-provisioned from monitoring/grafana/dashboards/
- Prometheus scrapes host.docker.internal:8888/metrics
- Shared hindsight-network for future service-to-service tracing
Change docker-compose from detached (-d) to foreground mode.
Users can now stop the stack with Ctrl+C instead of needing
to run docker-compose down separately.
- Remove GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH environment variable
  (was pointing to wrong path causing 'Failed to load home dashboard' error)
- Remove obsolete 'version' field from docker-compose.yaml
  (docker-compose v2+ doesn't require version field)
Mount Hindsight dashboard JSON files and custom provisioning config
to make dashboards visible in Grafana.

Changes:
- Mount hindsight-operations.json, hindsight-llm.json, hindsight-api-service.json to /otel-lgtm/
- Create grafana-dashboards.yaml with all dashboard providers (default + Hindsight)
- Mount custom provisioning config to override LGTM default

All 3 Hindsight dashboards now appear in Grafana UI with metrics
from Prometheus scraping the Hindsight API /metrics endpoint.
Update prometheus.yml to include both OTLP receiver config (from LGTM)
and scrape_configs for pulling metrics from Hindsight API.

Changes:
- Mount prometheus.yml to /otel-lgtm/prometheus.yaml (where LGTM reads it)
- Add scrape_configs section to pull from host.docker.internal:8888/metrics
- Keep OTLP receiver configuration for trace metrics
- Set scrape_interval to 5s

Verified: Prometheus now successfully scrapes hindsight_llm_calls_total
and other Hindsight metrics. Dashboards now show live data!
…_model_refresh spans

- Add recall operation tracing with parent-child span hierarchy
  - Parent: hindsight.recall with attributes (bank_id, query, fact_types, etc.)
  - Children: recall_embedding, recall_retrieval, recall_fusion, recall_rerank
  - Fixed context propagation using start_as_current_span()

- Improve reflect tracing spans
  - Remove reflect_generation spans, use reflect instead
  - Change done() tool processing to hindsight.reflect_tool_call

- Fix mental_model_refresh span nesting
  - Add _skip_span parameter to reflect_async to avoid duplicate hindsight.reflect spans
  - Mental model refresh now has clean span hierarchy without nested reflect parent

- Add comprehensive tracing verification tests
  - Test span hierarchy and attributes for all operations
  - Verify parent-child relationships
  - 5 passing tests covering recall, reflect, consolidation, and mental_model_refresh
@nicoloboschi nicoloboschi marked this pull request as ready for review February 10, 2026 10:07
- Remove all is_tracing_enabled() conditional checks before tracing calls
- NoOpTracer/NoOpSpan handle disabled tracing automatically
- Simplify code by always calling tracer methods directly
- Fix NoOpTracer.start_as_current_span() to yield NoOpSpan instead of None

Changes:
- memory_engine.py: Remove 5 is_tracing_enabled checks in recall spans
- agent.py: Remove 2 is_tracing_enabled checks in reflect tool spans
- tracing.py: Fix NoOpTracer context manager to yield proper NoOpSpan

This eliminates ~50 lines of redundant conditional code while maintaining
identical behavior.
- Make tracing documentation more concise
- Focus on span hierarchy and attributes
- Remove verbose troubleshooting and performance sections
- Keep configuration.md for env vars only
@nicoloboschi nicoloboschi merged commit 69dec8e into main Feb 10, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant