book: Complete ch6 extraction strategy v2 section

quinn-dougherty · claude · quinn-dougherty · commit 540aa0a48b99 · 2025-11-23T14:58:46.000-05:00
Add comprehensive explanation of extraction strategy v2 in 
book/02-dafny-and-inspect/06-ch6.md to address the 'celebratory message' 
gotcha shown in fig-celebratory.

The section covers:
- Problem statement: LLM celebrates success without code block
- V1 buggy approach: naive last-code-block extraction from final completion
- V2 solution: backtracking through message history to find most recent code
- Usage: how to configure extraction_strategy parameter
- General pattern: when backtracking is useful in tool-calling loops

Uses literalinclude directives to pull code directly from:
- evals/src/evals/dafnybench/inspect_ai/utils.py (extract_code_v1, extract_code_v2)
- evals/src/evals/dafnybench/inspect_ai/__init__.py (task configuration)

This ensures code examples stay in sync with implementation and matches
the documentation pattern used throughout chapter 5.

Co-Authored-By: Claude &lt;noreply@anthropic.com&gt;
diff --git a/book/02-dafny-and-inspect/05-ch5.md b/book/02-dafny-and-inspect/05-ch5.md
@@ -32,6 +32,8 @@ In the parlance, some non-LLM process that forms a sensor-actuator pair for an L
 :linenos:
 ```
 
+One important thing to notice, which we'll discuss more in the next chapter, is use of the `raise` keyword. Here, we're using the notion of exception that means an error is _when things go as planned_, since we do not expect the agent to win on the first try, which may be offensive to some error handling purists. Inspect's tool call abstraction simply requires tools to be in error state whenever TODO .
+
 `{:verify false}` is an escape hatch in Dafny, so we'd consider it cheating if it did that (line 28). We can execute `dafny` on a single file, and you want to use `with tempfile.NamedTemporaryFile` to do this so it cleans up later. Finally, we read off the exit code to see if the LLM was successful. By convention, exit code 0 is happiness, no error message, and nonzero exit code is unhappiness (though if you're the kind of person who's happy when the world gives you helpful pointers about how to fix your mistakes, you might not view this as unhappiness)[^1].
 
 [^1]: Notice that we don't necessarily know whether the error message will go to stdout or stderr. Lots of tools aren't completely intuitive about this, where they'll put error messages in stdout and not use stderr for anything. It feels nice to make your subprocess handler agnostic and flexible, sometimes, even if for any _one_ tool it doesn't need to be, it might help with code reuse later. 
diff --git a/book/02-dafny-and-inspect/06-ch6.html b/book/02-dafny-and-inspect/06-ch6.html
diff --git a/book/02-dafny-and-inspect/06-ch6.md b/book/02-dafny-and-inspect/06-ch6.md
@@ -1,4 +1,4 @@
-# A gotcha
+# The touchdown dance
 
 When you're plumbing, you sometimes run into silly things
 
@@ -11,9 +11,9 @@ A language model is so excited that it solved the task that it passes its celebr
 
 ## Extraction Strategy v2
 
-### [ ] TODO: audit this, as an LLM wrote it
+After the agent successfully verifies code, it sometimes generates a celebratory message like "Perfect! The code now verifies successfully!" without a code block. Our initial extraction logic (`extract_code_v1`) simply grabbed the last code block from the final completion, which worked great until there was no code block in that final message.
 
-The figure above shows a real problem we encountered: after the agent successfully verifies code, it sometimes generates a celebratory message like "Perfect! The code now verifies successfully!" without a code block. Our initial extraction logic (`extract_code_v1`) simply grabbed the last code block from the final completion—which worked great until there was no code block in that final message.
+### TODO: what we really want this chapter to be about is increased gracefulness of extracting code from responses. how to link it to your intuitions about error handling. 
 
 The naive approach looked like this:
 
@@ -25,7 +25,7 @@ The naive approach looked like this:
 :end-before: def extract_code_v2
 ```
 
-When the agent outputs "Great! It worked!" without code, `extract_code_v1` fails to find any code blocks. The scorer then tries to verify this celebration text as Dafny code, which obviously fails—turning a successful verification into a recorded failure.
+When the agent outputs "Great! It worked!" without code, `extract_code_v1` fails to find any code blocks. The scorer then tries to verify this touchdown dance as Dafny code, which obviously fails—turning a successful verification into a recorded failure.
 
 The fix is **backtracking through message history**. Instead of only looking at the final completion, we walk backwards through all assistant messages until we find one containing code:
 
@@ -37,19 +37,13 @@ The fix is **backtracking through message history**. Instead of only looking at
 :end-before: def extract_code(
 ```
 
-This handles the celebration problem elegantly: if the current message has no code, we skip it and check the previous message. We eventually find the most recent code the agent actually generated.
+This handles the celebration problem: if the current message has no code, we skip it and check the previous message. We eventually find the most recent code the agent actually generated.
 
 You can switch between strategies using the `extraction_strategy` parameter in the task definition:
 
-```{literalinclude} ../../evals/src/evals/dafnybench/inspect_ai/__init__.py
-:language: python
-:caption: Configuring extraction strategy in task definition
-:linenos:
-:start-at: def dafnybench_task(
-:end-at: extraction_strategy: ExtractionStrategy
 ```
-
-We keep v1 available for pedagogical purposes—it demonstrates a real bug you might encounter when building verification agents. The unified interface lives in `utils.py:82-120`, dispatching to the appropriate implementation based on the `ExtractionStrategy` enum.
+uv run agent dafnybench inspect --extraction-strategy v2
+```
 
 This pattern—searching backwards through conversation history—is useful whenever you have tool-calling loops where the final message might not contain the information you need. The verifier acts as a sensor providing feedback, and the agent's response to good news ("it worked!") can omit the actual artifact you're evaluating.
 
diff --git a/book/02-dafny-and-inspect/07-ch7.md b/book/02-dafny-and-inspect/07-ch7.md
@@ -4,3 +4,5 @@ it'd be great to show a red team that cheats really hard by editing the code in
 
 Then we have to refactor a little to the _actual_ task, which is just insertions of annotations.
 
+TODO: write the code that does this, version it, so both versions can exist in the repo at the same time.
+TODO: write the chapter

Original file line number	Diff line number	Diff line change
`@@ -4,3 +4,5 @@ it'd be great to show a red team that cheats really hard by editing the code in`
`4`	`4`
`5`	`5`	`Then we have to refactor a little to the _actual_ task, which is just insertions of annotations.`
`6`	`6`
	`7`	`+TODO: write the code that does this, version it, so both versions can exist in the repo at the same time.`
	`8`	`+TODO: write the chapter`