Goal
Produce one ordered, human-readable record of what a learning run did: "synthesized N rules → iteration 1 patched X (accepted, +3 F1) → audit merged A,B → pruned C → final 22 rules." Today these facts are computed and printed in passing but never collected into a single reconstructable trace.
The pieces already exist (collect, don't recompute)
rulechef/learner.py::_log_patch_decision (L640) — per patch: candidate rules, accept/reject, metric delta (already written as trajectory records).
rulechef/pipeline.py::_apply_audit (L296) — applies merge/remove actions and has the before/after rules.
rulechef/coordinator.py::AuditResult (L26) and critique_rules — the audit actions and critic feedback objects.
rulechef/training_logger.py — the raw prompts/responses per step.
What to do
- Thread a run-scoped trace collector through
learn_rules (the Pipeline in rulechef/pipeline.py is the natural owner) that appends one entry per step:
synthesis: initial rules (names + count)
iteration N: failure summary, patch added, accepted/rejected, F1 before→after
audit: actions taken (merged [a,b]->c, removed [d])
prune: rules dropped
final: surviving rules
- Write it to
<storage>/<dataset>.trace.json and add a pretty-printer (reuse the style of print_ranking_report in rulechef/ranking.py).
Gotchas
- Keep it opt-in or cheap — don't store full doc text per step, just rule names/ids and metrics (the prompts are already in
training_logger).
- The agentic vs simple coordinator paths differ; make sure both feed the collector.
Acceptance
A learning run writes a trace.json that reconstructs the step-by-step story, and print_trace(trace) renders it. Test on a tiny in-memory dataset (no LLM needed if you stub synthesis) asserting the trace has synthesis → … → final entries in order.
Pointers
rulechef/pipeline.py (run, _apply_audit L296), rulechef/learner.py (_log_patch_decision L640), rulechef/coordinator.py (AuditResult L26), rulechef/training_logger.py.
Goal
Produce one ordered, human-readable record of what a learning run did: "synthesized N rules → iteration 1 patched X (accepted, +3 F1) → audit merged A,B → pruned C → final 22 rules." Today these facts are computed and printed in passing but never collected into a single reconstructable trace.
The pieces already exist (collect, don't recompute)
rulechef/learner.py::_log_patch_decision(L640) — per patch: candidate rules, accept/reject, metric delta (already written as trajectory records).rulechef/pipeline.py::_apply_audit(L296) — applies merge/remove actions and has the before/after rules.rulechef/coordinator.py::AuditResult(L26) andcritique_rules— the audit actions and critic feedback objects.rulechef/training_logger.py— the raw prompts/responses per step.What to do
learn_rules(thePipelineinrulechef/pipeline.pyis the natural owner) that appends one entry per step:synthesis: initial rules (names + count)iteration N: failure summary, patch added, accepted/rejected, F1 before→afteraudit: actions taken (merged [a,b]->c, removed [d])prune: rules droppedfinal: surviving rules<storage>/<dataset>.trace.jsonand add a pretty-printer (reuse the style ofprint_ranking_reportinrulechef/ranking.py).Gotchas
training_logger).Acceptance
A learning run writes a
trace.jsonthat reconstructs the step-by-step story, andprint_trace(trace)renders it. Test on a tiny in-memory dataset (no LLM needed if you stub synthesis) asserting the trace has synthesis → … → final entries in order.Pointers
rulechef/pipeline.py(run,_apply_auditL296),rulechef/learner.py(_log_patch_decisionL640),rulechef/coordinator.py(AuditResultL26),rulechef/training_logger.py.