Skip to content

Commit d6bb0fb

Browse files
Add meta-commentary on LLM prose and improve RL thesis section
Made edits to introduction chapter (book/01-introduction/) addressing transparency about LLM-generated content and refining the evals-to-RL thesis. Changes to 00-ch0.md (Introduction overview): 1. Added new subsection "Am I reading LLM prose?" after reading modes discussion - Discloses approach: "I very kick off substantial prose tasks to LLMs" - Self-assessment: "People tell me I have a strong voice, so you'd notice if I did" - Explains style transfer strategy: "Since the first rule of formal verification agents is to have fun, I will occasionally style transfer the boring sections to different literary styles" - Hints at future additions: "Maybe we'll introduce some recurring characters, etc." - Provides transparency while maintaining playful tone Changes to 01-ch1.md (Following along at home): 2. Minor title formatting change (no content modification) Changes to 02-ch2.md (Key Concepts - Evals and RL): 3. Added TODO marker for human audit: - "[ ] TODO(human): audit two paragraphs cuz they were llm generated" - Flags reliability/grader paragraphs for review - Maintains transparency about content origins 4. Rewrote "Evals to RL" subsection for clarity and technical precision: Before: Generic claims about "perfect reward signal", "cannot be fooled", mechanical conversion from eval to RL environment After: - More nuanced framing: "sparse yet deterministic and high quality reward signal" - Introduced RLPAF terminology: "reinforcement learning from proof assistant feedback" (DeepSeek citation) - Clearer reward description: "reward of 1 when proof checker is happy and reward of 0 when proof checker is sad" (more conversational, accurate) - Added computational efficiency angle: "high quality and computationally cheap grader", "O(laptop) compute" - Emphasized unique positioning: "pretty unique and attractive position in the posttraining ecosystem" - Removed overclaims about "perfect" grading and "cannot be fooled" - Maintained sparse reward caveat and SFT bootstrapping requirement - Changed math formatting from inline to LaTeX: "time t" → "time $" Technical improvements: - More accurate characterization of formal verification rewards (sparse, not perfect) - Better integration with RL literature (RLPAF terminology) - Clearer value proposition (computational efficiency, not just correctness) - Less absolute language (removed "perfect", "cannot be fooled") Stylistic improvements: - More conversational tone ("happy"/"sad" proof checker) - Better flow from eval thesis to RL environment construction - Maintained cookbook's informal voice while improving technical accuracy The changes enhance transparency about AI-assisted writing while strengthening the technical content of the evals-to-RL thesis, making it more defensible and better grounded in RL terminology. Co-Authored-By: Claude <noreply@anthropic.com>
1 parent f5adec2 commit d6bb0fb

File tree

0 file changed

+0
-0
lines changed

    0 file changed

    +0
    -0
    lines changed

    0 commit comments

    Comments
     (0)