Skip to content

feat: export LLM traces for all call sites, not just deriver#529

Open
3un01a wants to merge 1 commit intomainfrom
yuya/trace_generation
Open

feat: export LLM traces for all call sites, not just deriver#529
3un01a wants to merge 1 commit intomainfrom
yuya/trace_generation

Conversation

@3un01a
Copy link
Copy Markdown
Contributor

@3un01a 3un01a commented Apr 8, 2026

[Summary]

  • Remove the trace_name guard in honcho_llm_call so that all LLM generation calls are logged to the JSONL traces file when REASONING_TRACES_FILE is set, not just the ones that explicitly opt in
  • Add trace_name tags to the three call sites that were previously untraced: summarizer (short_summary, long_summary) and dreamer specialists (dreamer_deduction, dreamer_induction)
  • Calls without an explicit trace_name are logged with task_type: "untagged" as a fallback

[Motivation]
Previously, only the minimal deriver and dialectic chat paths emitted traces. This meant there was no visibility into LLM inputs/outputs for summarization or dream cycle calls which makes it harder to debug memory quality, audit model behavior, or benchmark across the full pipeline. With this change, setting a single env var captures every generative LLM call with its module-level tag, input/output pairs, token counts, and tool call history.

This PR will ensure that we have a standardized way to get the traces data we need for training models, benchmarking (also affects excadrill), and doing downstream analysis for different modules. This can enable easy benchmarking for things like the summary, dreamer, etc as per the wishlist.

[Traced call sites]

Module task_type tag
Deriver minimal_deriver
Dialectic dialectic_chat
Dreamer (deduction) dreamer_deduction
Dreamer (induction) dreamer_induction
Summarizer (short) short_summary
Summarizer (long) long_summary

[Test plan]

  • Set REASONING_TRACES_FILE=traces.jsonl and run the server with message ingestion + dialectic queries
  • Verify JSONL contains entries for all six task_type values
  • Verify each entry has both input (prompt/messages + tokens) and output (content + tokens) fields
  • Verify unset REASONING_TRACES_FILE produces no trace file (no regression)

Summary by CodeRabbit

  • Chores
    • Enhanced internal tracing and logging infrastructure to improve system diagnostics and monitoring capabilities across core components.

@3un01a 3un01a requested a review from VVoruganti April 8, 2026 21:16
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3d8643a-086c-4ee3-91db-3c267eb7c754

📥 Commits

Reviewing files that changed from the base of the PR and between 5b6bd59 and 56d26b2.

📒 Files selected for processing (3)
  • src/dreamer/specialists.py
  • src/utils/clients.py
  • src/utils/summarizer.py

Walkthrough

Three files are modified to add trace_name parameter propagation to honcho_llm_call invocations across specialists and summarization functions. Additionally, the clients utility now unconditionally logs reasoning traces for HonchoLLMCallResponse results instead of gating on trace name presence.

Changes

Cohort / File(s) Summary
Trace Naming for LLM Calls
src/dreamer/specialists.py, src/utils/summarizer.py
Added trace_name arguments to honcho_llm_call invocations: specialists use dynamic naming via f"dreamer_{self.name}", and summarizer functions use static names "short_summary" and "long_summary".
Unconditional Trace Logging
src/utils/clients.py
Removed conditional gating on trace_name for reasoning trace logging; now logs unconditionally for HonchoLLMCallResponse results and sets task_type to trace_name or "untagged".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Traces hop through code so neat,
Each call now wears a name complete,
From dreamers' steps to summaries grand,
We track them all across the land!
No more gates to slow the flow,
Let all the reasoning traces glow!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main objective: enabling LLM trace exports across all call sites rather than just the deriver module.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch yuya/trace_generation

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant