Commit d6bb0fb
Add meta-commentary on LLM prose and improve RL thesis section
Made edits to introduction chapter (book/01-introduction/) addressing transparency
about LLM-generated content and refining the evals-to-RL thesis.
Changes to 00-ch0.md (Introduction overview):
1. Added new subsection "Am I reading LLM prose?" after reading modes discussion
- Discloses approach: "I very kick off substantial prose tasks to LLMs"
- Self-assessment: "People tell me I have a strong voice, so you'd notice if I did"
- Explains style transfer strategy: "Since the first rule of formal verification
agents is to have fun, I will occasionally style transfer the boring sections
to different literary styles"
- Hints at future additions: "Maybe we'll introduce some recurring characters, etc."
- Provides transparency while maintaining playful tone
Changes to 01-ch1.md (Following along at home):
2. Minor title formatting change (no content modification)
Changes to 02-ch2.md (Key Concepts - Evals and RL):
3. Added TODO marker for human audit:
- "[ ] TODO(human): audit two paragraphs cuz they were llm generated"
- Flags reliability/grader paragraphs for review
- Maintains transparency about content origins
4. Rewrote "Evals to RL" subsection for clarity and technical precision:
Before: Generic claims about "perfect reward signal", "cannot be fooled", mechanical
conversion from eval to RL environment
After:
- More nuanced framing: "sparse yet deterministic and high quality reward signal"
- Introduced RLPAF terminology: "reinforcement learning from proof assistant feedback"
(DeepSeek citation)
- Clearer reward description: "reward of 1 when proof checker is happy and reward
of 0 when proof checker is sad" (more conversational, accurate)
- Added computational efficiency angle: "high quality and computationally cheap grader",
"O(laptop) compute"
- Emphasized unique positioning: "pretty unique and attractive position in the
posttraining ecosystem"
- Removed overclaims about "perfect" grading and "cannot be fooled"
- Maintained sparse reward caveat and SFT bootstrapping requirement
- Changed math formatting from inline to LaTeX: "time t" → "time $"
Technical improvements:
- More accurate characterization of formal verification rewards (sparse, not perfect)
- Better integration with RL literature (RLPAF terminology)
- Clearer value proposition (computational efficiency, not just correctness)
- Less absolute language (removed "perfect", "cannot be fooled")
Stylistic improvements:
- More conversational tone ("happy"/"sad" proof checker)
- Better flow from eval thesis to RL environment construction
- Maintained cookbook's informal voice while improving technical accuracy
The changes enhance transparency about AI-assisted writing while strengthening the
technical content of the evals-to-RL thesis, making it more defensible and better
grounded in RL terminology.
Co-Authored-By: Claude <noreply@anthropic.com>1 parent f5adec2 commit d6bb0fb
File tree
0 file changed
+0
-0
lines changed0 file changed
+0
-0
lines changed
0 commit comments