Skip to content

Commit 9242c8e

Browse files
sjarmakclaude
andcommitted
docs: update sg_only documentation for v2 clone-at-verify pattern
Replace all references to the old /repo_full/ backup pattern with the new clone-at-verify approach using clone manifests. Updates: - docs/CONFIGS.md: rewrite MCP-Full Docker Environment section - CLAUDE.md + AGENTS.md: update verifier description - templates/mcp_unique_task/Dockerfile.sg_only.j2: fix stale comment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a0ea3ca commit 9242c8e

File tree

4 files changed

+49
-24
lines changed

4 files changed

+49
-24
lines changed

AGENTS.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,9 @@ Standard pairing: **baseline-local-direct** (full local code, no MCP) and
6262
**mcp-remote-direct** (source deleted, Sourcegraph MCP). Artifact evaluation
6363
uses **baseline-local-artifact** + **mcp-remote-artifact** (review.json output).
6464
MCP configs use `Dockerfile.sg_only` or `Dockerfile.artifact_only` so the
65-
agent must discover code via MCP tools. The verifier restores the full repo
66-
before scoring. See `docs/CONFIGS.md` for the full config matrix.
65+
agent must discover code via MCP tools. The verifier clones the mirror repo
66+
at verification time and overlays agent changes before scoring.
67+
See `docs/CONFIGS.md` for the full config matrix.
6768

6869
## Standard Workflow
6970
0. **Before commit or push:** Run `python3 scripts/repo_health.py` (or `--quick`). Fix any failures so main stays clean and drift is caught early (see `docs/REPO_HEALTH.md`).

CLAUDE.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,9 @@ Standard pairing: **baseline-local-direct** (full local code, no MCP) and
6262
**mcp-remote-direct** (source deleted, Sourcegraph MCP). Artifact evaluation
6363
uses **baseline-local-artifact** + **mcp-remote-artifact** (review.json output).
6464
MCP configs use `Dockerfile.sg_only` or `Dockerfile.artifact_only` so the
65-
agent must discover code via MCP tools. The verifier restores the full repo
66-
before scoring. See `docs/CONFIGS.md` for the full config matrix.
65+
agent must discover code via MCP tools. The verifier clones the mirror repo
66+
at verification time and overlays agent changes before scoring.
67+
See `docs/CONFIGS.md` for the full config matrix.
6768

6869
## Standard Workflow
6970
0. **Before commit or push:** Run `python3 scripts/repo_health.py` (or `--quick`). Fix any failures so main stays clean and drift is caught early (see `docs/REPO_HEALTH.md`).

docs/CONFIGS.md

Lines changed: 42 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -103,40 +103,63 @@ discovery. This is the standard execution model for all `*_2config.sh` runs.
103103
1. Each task provides a `Dockerfile.sg_only` alongside its regular `Dockerfile`.
104104
2. The config script copies `Dockerfile.sg_only` over `Dockerfile` before the
105105
MCP-Full run (baseline uses the original `Dockerfile`).
106-
3. `Dockerfile.sg_only` clones the repo, backs up the full workspace to
107-
`/repo_full/`, then **truncates all source files** in `/workspace/`
108-
(zero-byte files preserve directory structure but contain no code).
106+
3. `Dockerfile.sg_only` creates an empty or truncated workspace (no usable
107+
source code) and writes a **clone manifest** to
108+
`/tmp/.sg_only_clone_manifest.json` telling the verifier which sg-benchmarks
109+
mirror(s) to clone at verification time.
109110
4. A sentinel file `/tmp/.sg_only_mode` is written at build time.
110-
5. The agent runs with truncated source — local `Read`, `Grep`, `Glob` return
111-
empty/useless results, forcing reliance on MCP tools.
111+
5. The agent runs with empty/truncated source — local `Read`, `Grep`, `Glob`
112+
return empty/useless results, forcing reliance on MCP tools.
112113
6. At verification time, `test.sh` detects `/tmp/.sg_only_mode` and sources
113-
`sgonly_verifier_wrapper.sh`, which restores the full repo from `/repo_full/`
114-
and overlays any files the agent wrote, so the verifier runs against the
115-
correct codebase.
114+
`sgonly_verifier_wrapper.sh`, which clones the mirror repo(s) from the
115+
manifest, optionally re-runs defect injection, overlays agent-written files,
116+
and then hands off to the verifier.
117+
118+
**Clone manifest format** (`/tmp/.sg_only_clone_manifest.json`):
119+
120+
```json
121+
{"workdir":"/workspace","repos":[{"mirror":"sg-benchmarks/django--674eda1c","target_dir":"."}]}
122+
```
123+
124+
Multi-repo tasks list multiple entries; code-review tasks add `"inject_defects"`.
116125

117126
**Key paths inside the container:**
118127

119128
| Path | Contents |
120129
|---|---|
121-
| `/workspace/` | Truncated source (agent sees this) |
122-
| `/repo_full/` | Full repo backup (verifier restores from here) |
123-
| `/tests/` | Harbor-uploaded test harness (verifier scripts, ground truth) |
130+
| `/workspace/` (or `/app/`) | Empty or truncated source (agent sees this) |
131+
| `/tmp/.sg_only_clone_manifest.json` | Clone manifest — verifier clones mirrors from here |
124132
| `/tmp/.sg_only_mode` | Sentinel that activates verifier restoration |
133+
| `/tests/` | Harbor-uploaded test harness (verifier scripts, ground truth) |
125134
| `/logs/agent/` | Agent output (solution.md, patches) |
126135

127-
**Write-only suites** (docgen, nlqa, onboarding, investigation, linuxflbench)
136+
**Write-only tasks** (docgen, nlqa, onboarding, investigation, linuxflbench)
128137
have verifiers that only check agent-written output files, not compiled code.
129-
Their `Dockerfile.sg_only` typically omits the repo clone entirely.
138+
Their `Dockerfile.sg_only` provides an empty workspace with no clone manifest.
130139

131-
**Build-requiring suites** (largerepo, codereview, swebenchpro, pytorch,
140+
**Build-requiring tasks** (largerepo, codereview, swebenchpro, pytorch,
132141
enterprise, etc.) need the full repo for compilation/test execution.
133-
`sgonly_verifier_wrapper.sh` handles the restore-and-overlay cycle.
142+
`sgonly_verifier_wrapper.sh` reads the clone manifest, clones mirrors with
143+
`--depth 1`, and overlays agent changes before the verifier runs.
144+
145+
### Build-requiring subcategories
146+
147+
| Type | FROM base | Clone strategy |
148+
|---|---|---|
149+
| ccb-repo-* tasks | Underlying base (e.g. `golang:1.23-bookworm`) | Empty workspace + clone manifest |
150+
| SWE-bench tasks | `jefzda/sweap-images:*` (preserves test venv) | Truncate source + clone manifest (restores `.py` files) |
151+
| Code-review tasks | `ubuntu:22.04` | Empty workspace + manifest + `inject_defects` |
152+
| Multi-repo tasks | `ubuntu:22.04` or language base | Multiple repos in manifest with `target_dir` |
153+
| Inline-clone tasks | Various | Empty workspace + clone manifest |
134154

135155
### Adding sg_only support to a new task
136156

137-
1. Create `environment/Dockerfile.sg_only` — clone repo, `cp -a /workspace /repo_full`,
138-
truncate source, write sentinel.
139-
2. Create `tests/sgonly_verifier_wrapper.sh` (or use the standard template).
157+
Prefer using the generator: `python3 scripts/generate_sgonly_dockerfiles.py`.
158+
To add manually:
159+
160+
1. Create `environment/Dockerfile.sg_only` — write sentinel, write clone
161+
manifest JSON, and leave workspace empty or truncated.
162+
2. The generator automatically copies `tests/sgonly_verifier_wrapper.sh`.
140163
3. Add the sg_only hook at the top of `tests/test.sh`:
141164
```bash
142165
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
@@ -149,7 +172,7 @@ enterprise, etc.) need the full repo for compilation/test execution.
149172
The configuration is controlled by the `BASELINE_MCP_TYPE` environment variable in `claude_baseline_agent.py`:
150173

151174
- **Baseline (`none`):** No MCP config is loaded. Uses the task's regular `Dockerfile`. The system prompt contains only the evaluation context. No `--tools` or `--disallowedTools` flags are applied.
152-
- **MCP-Full (`sourcegraph_full`):** Uses `Dockerfile.sg_only` (truncated local source). The Sourcegraph MCP config is loaded (`.api/mcp/v1` endpoint). All local tools remain available but return empty results for source files. The system prompt instructs MCP-first usage with all 13 Sourcegraph MCP tools.
175+
- **MCP-Full (`sourcegraph_full`):** Uses `Dockerfile.sg_only` (empty or truncated local source). The Sourcegraph MCP config is loaded (`.api/mcp/v1` endpoint). All local tools remain available but return empty results for source files. The verifier clones mirrors at verification time via clone manifest. The system prompt instructs MCP-first usage with all 13 Sourcegraph MCP tools.
153176

154177
Both configs use `--dangerously-skip-permissions` for autonomous operation and deliver evaluation context via `--append-system-prompt`.
155178

templates/mcp_unique_task/Dockerfile.sg_only.j2

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# $task_id — sg_only variant
22
# No local repo clone — agent uses Sourcegraph MCP exclusively for code access.
3-
# The verifier restores the full repo from /repo_full/ before scoring.
3+
# The verifier clones mirror repos at verification time (no /repo_full/ backup).
44

55
FROM ubuntu:22.04
66

0 commit comments

Comments
 (0)