Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: Chen17-sq/clearscript
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.0.19
Choose a base ref
...
head repository: Chen17-sq/clearscript
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.0.20
Choose a head ref
  • 1 commit
  • 15 files changed
  • 1 contributor

Commits on May 23, 2026

  1. v0.0.20 — self-review pass (2nd LLM call) + L3.5 routine

    User feedback after v0.0.19's prompt rewrite: "继续很深入改". Deeper
    this time: not just better prompts, but a structural change to the
    pipeline itself.
    
    ## The biggest single quality multiplier we can ship
    
    A second LLM call on the stitched output catches 20-30% more L3 errors
    that the first pass missed. This is the single biggest quality lift
    short of changing the model. Default ON.
    
    ## Self-review (Stage 6) — wired into iter_events
    
    After all chunks complete:
    
    1. Build a payload of {edited_markdown, change_log, library_context,
       briefing}
    2. Call the model with the self_review system prompt
    3. Parse JSON: {additional_corrections, rollbacks,
       promotions_to_user_review, data_conflicts, format_issues}
    4. Apply additional_corrections to the markdown via string replace
       (only if `old` actually appears in the doc — model hallucination
       guard)
    5. Append review changes to change_log with stage="self_review"
    6. Surface diagnostics on the `complete` event's `self_review` field
    
    Cost: 1 extra LLM call per Run regardless of chunk count. Auto-skipped
    when stitched output > 100k chars to bound cost on monster transcripts.
    
    SSE events: self_review_start / self_review_done / self_review_error.
    UI shows "↻ Self-review — re-reading for missed corrections…" then
    "+N fixes · M data conflicts to check · K ambiguous items flagged".
    
    Pipeline param: `enable_self_review: bool = True`. Tests that exercise
    chunking/main-edit alone opt out explicitly.
    
    ## Self-review prompt (06_self_review.md)
    
    Rewritten from a thin checklist into a 5-check routine:
    
    1. Proper noun audit (the headline reason; first pass misses 20-30%)
    2. Speaker consistency across full document
    3. Cross-section data consistency (ARR / headcount / funding agree)
    4. Format hygiene (leftover [Speaker N], mixed punctuation, etc.)
    5. Over-correction rollbacks (confidence < 0.7 sanity check)
    
    Hard JSON output schema. "Don't second-guess high-confidence work."
    
    ## L3.5 rewritten
    
    Same treatment as L3 got in v0.0.19. The conservative L3.5 table
    now wraps a mandatory 7-check routine: sentence boundary correctness,
    stutter dedup (exact X X only), word-order garbling, missing function
    words, speaker-switch swallowed mid-paragraph, number/letter
    confusion in spoken digits, same-sound substitution destroying
    meaning (cohort vs co-host).
    
    Hard confidence threshold table per change type. Explicit "what
    L3.5 does NOT do" list keeps the model from over-correcting.
    
    ## Tests
    
    284 → 292 (+8). All passing. Ruff clean.
    
    test_pipeline_self_review.py covers: event ordering, additional
    corrections actually applied, token counts include both passes,
    opt-out flag, auto-skip for huge output, garbage response doesn't
    crash, diagnostics surfaced in complete, ignore corrections where
    old not in doc (hallucination guard).
    
    Existing tests that asserted exact token counts updated to reflect
    the new 2-pass total.
    Chen17-sq committed May 23, 2026
    Configuration menu
    Copy the full SHA
    d6af598 View commit details
    Browse the repository at this point in the history
Loading