Tags: caiopizzol/comment-bench
Tags
feat: initial public release A benchmark for whether inline comments steer AI coding agents on small surgical edits. Four scenarios in the gift-card refund domain test where the protected invariant (24h cap) lives: branch, accumulator, helper, or comment-only. Seven comment treatments per scenario. Deliverable is comment-policy.md. RESULTS.md describes the result patterns the benchmark probes for.