feat: add otel traceability by nicoloboschi · Pull Request #330 · vectorize-io/hindsight

nicoloboschi · 2026-02-09T17:33:05Z

No description provided.

- Add tool execution spans for reflect operations - Add tool call information (names, params) to spans - Change verification scope from 'test' to 'verification' - Add hindsight.reflect_generation span for done() processing - Implement no-op tracer for improved code readability - Update documentation for OTEL configuration - Resolve merge conflicts from rebase

- Add _serialize_for_span() helper to handle Pydantic models - Update all providers to use the helper function - Fixes test failures with 'Object of type X is not JSON serializable'

Add Grafana LGTM (Loki, Grafana, Tempo, Mimir) as the recommended local development observability stack. This provides traces, metrics, and logs in a single Docker container instead of separate tools. Changes: - Add scripts/dev/grafana/ with docker-compose and README - Add scripts/dev/start-grafana.sh startup script - Update .env.example to reference Grafana LGTM - Update configuration docs to emphasize Grafana LGTM as primary option - Reorder OTLP backend list to show Grafana LGTM first Benefits: - Single container vs multiple separate tools (Jaeger, SigNoz, etc.) - ~515MB image with full observability stack - Compatible with existing OTLP configuration - Simpler local development setup

Remove SigNoz observability stack in favor of Grafana LGTM as the sole recommended local development tracing solution. Changes: - Delete scripts/dev/signoz/ directory and all SigNoz configurations - Delete scripts/dev/start-signoz.sh startup script - Remove SigNoz references from .env.example - Remove SigNoz from OTLP backends list in configuration docs Grafana LGTM provides the same capabilities (traces, metrics, logs) in a simpler single-container setup.

Add parent-child span structure for consolidation operations: - hindsight.consolidation: Parent span for each memory being processed - hindsight.consolidation_recall: Child span for finding related observations - LLM call span: Automatically created by LLM provider (scope="consolidation") This enables detailed timing breakdown in Grafana Tempo: - Total consolidation time per memory - Time spent in recall - Time spent in LLM call - Time spent executing actions (create/update) All consolidation tests pass (31/31).

Add comprehensive metrics and dashboarding to the Grafana LGTM stack: Metrics Collection: - Configure Prometheus to scrape Hindsight API /metrics endpoint - Scrape interval: 10 seconds - Targets hindsight-api on host.docker.internal:8888 GenAI Dashboard: - Pre-configured dashboard with 6 panels: - LLM call rate (by provider/model) - LLM call duration (p50/p95 by scope) - Token usage - input tokens/sec by scope - Token usage - output tokens/sec by scope - Operations rate (retain/recall/reflect/consolidation) - Operation duration p95 by operation type Configuration: - Mount prometheus.yml for metrics scraping - Mount dashboards directory for auto-provisioning - Add host.docker.internal mapping for container->host access - Dashboard provisioning with auto-reload every 10s Documentation: - Updated README with metrics viewing instructions - Added PromQL query examples - Documented dashboard access and navigation This provides full observability: traces (Tempo) + metrics (Prometheus/Mimir) + dashboards (Grafana)

Consolidate the separate scripts/dev/grafana/ setup into the existing scripts/dev/monitoring/ stack, using Grafana LGTM (Loki, Grafana, Tempo, Mimir). Changes: - Remove separate scripts/dev/grafana/ directory and start-grafana.sh - Rewrite scripts/dev/monitoring/start.sh to use Docker + Grafana LGTM (was: download native Prometheus/Grafana binaries) - Add docker-compose.yaml for Grafana LGTM container - Add prometheus.yml for scraping Hindsight API metrics - Mount existing dashboards from monitoring/grafana/dashboards/ - Add comprehensive README.md Benefits: - Single unified monitoring command: ./scripts/dev/start-monitoring.sh - Uses existing dashboard files (hindsight-operations, hindsight-llm, hindsight-api-service) - Simpler setup: Docker-based vs downloading/running native binaries - Full observability: traces + metrics + logs + dashboards in one container - Standard ports: Grafana on 3000, OTLP on 4317/4318 Architecture: - Grafana LGTM container (~515MB) provides all components - Dashboards auto-provisioned from monitoring/grafana/dashboards/ - Prometheus scrapes host.docker.internal:8888/metrics - Shared hindsight-network for future service-to-service tracing

Change docker-compose from detached (-d) to foreground mode. Users can now stop the stack with Ctrl+C instead of needing to run docker-compose down separately.

- Remove GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH environment variable (was pointing to wrong path causing 'Failed to load home dashboard' error) - Remove obsolete 'version' field from docker-compose.yaml (docker-compose v2+ doesn't require version field)

Mount Hindsight dashboard JSON files and custom provisioning config to make dashboards visible in Grafana. Changes: - Mount hindsight-operations.json, hindsight-llm.json, hindsight-api-service.json to /otel-lgtm/ - Create grafana-dashboards.yaml with all dashboard providers (default + Hindsight) - Mount custom provisioning config to override LGTM default All 3 Hindsight dashboards now appear in Grafana UI with metrics from Prometheus scraping the Hindsight API /metrics endpoint.

Update prometheus.yml to include both OTLP receiver config (from LGTM) and scrape_configs for pulling metrics from Hindsight API. Changes: - Mount prometheus.yml to /otel-lgtm/prometheus.yaml (where LGTM reads it) - Add scrape_configs section to pull from host.docker.internal:8888/metrics - Keep OTLP receiver configuration for trace metrics - Set scrape_interval to 5s Verified: Prometheus now successfully scrapes hindsight_llm_calls_total and other Hindsight metrics. Dashboards now show live data!

…_model_refresh spans - Add recall operation tracing with parent-child span hierarchy - Parent: hindsight.recall with attributes (bank_id, query, fact_types, etc.) - Children: recall_embedding, recall_retrieval, recall_fusion, recall_rerank - Fixed context propagation using start_as_current_span() - Improve reflect tracing spans - Remove reflect_generation spans, use reflect instead - Change done() tool processing to hindsight.reflect_tool_call - Fix mental_model_refresh span nesting - Add _skip_span parameter to reflect_async to avoid duplicate hindsight.reflect spans - Mental model refresh now has clean span hierarchy without nested reflect parent - Add comprehensive tracing verification tests - Test span hierarchy and attributes for all operations - Verify parent-child relationships - 5 passing tests covering recall, reflect, consolidation, and mental_model_refresh

- Remove all is_tracing_enabled() conditional checks before tracing calls - NoOpTracer/NoOpSpan handle disabled tracing automatically - Simplify code by always calling tracer methods directly - Fix NoOpTracer.start_as_current_span() to yield NoOpSpan instead of None Changes: - memory_engine.py: Remove 5 is_tracing_enabled checks in recall spans - agent.py: Remove 2 is_tracing_enabled checks in reflect tool spans - tracing.py: Fix NoOpTracer context manager to yield proper NoOpSpan This eliminates ~50 lines of redundant conditional code while maintaining identical behavior.

- Make tracing documentation more concise - Focus on span hierarchy and attributes - Remove verbose troubleshooting and performance sections - Keep configuration.md for env vars only

nicoloboschi force-pushed the traceability branch from b66a879 to 0fe75e4 Compare February 9, 2026 17:41

nicoloboschi added 11 commits February 9, 2026 18:55

fix: properly serialize Pydantic models in span recording

58e996a

- Add _serialize_for_span() helper to handle Pydantic models - Update all providers to use the helper function - Fixes test failures with 'Object of type X is not JSON serializable'

fix: run monitoring stack in foreground for easy Ctrl+C stop

9036cb4

Change docker-compose from detached (-d) to foreground mode. Users can now stop the stack with Ctrl+C instead of needing to run docker-compose down separately.

nicoloboschi marked this pull request as ready for review February 10, 2026 10:07

nicoloboschi added 2 commits February 10, 2026 11:16

docs: simplify distributed tracing section in monitoring.md

2c7bdae

- Make tracing documentation more concise - Focus on span hierarchy and attributes - Remove verbose troubleshooting and performance sections - Keep configuration.md for env vars only

nicoloboschi merged commit 69dec8e into main Feb 10, 2026
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add otel traceability#330

feat: add otel traceability#330
nicoloboschi merged 14 commits intomainfrom
traceability

nicoloboschi commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant