@@ -24,12 +24,23 @@ Config names encode three independent dimensions:
2424| ---| ---| ---| ---| ---| ---|
2525| ` baseline-local-direct ` | No MCP | Full source | Git changes | ` none ` | Original |
2626| ` mcp-remote-direct ` | MCP | Source deleted | Git changes | ` sourcegraph_full ` | ` Dockerfile.sg_only ` |
27+ | ` mcp-scip-remote-direct ` | MCP + SCIP | Source deleted | Git changes | ` sourcegraph_full ` | ` Dockerfile.sg_only ` |
2728| ` baseline-local-artifact ` | No MCP | Full source | ` review.json ` | ` none ` | ` Dockerfile.artifact_only ` |
2829| ` mcp-remote-artifact ` | MCP | Source deleted | ` review.json ` | ` artifact_full ` | ` Dockerfile.artifact_only ` |
30+ | ` mcp-scip-remote-artifact ` | MCP + SCIP | Source deleted | ` review.json ` | ` artifact_full ` | ` Dockerfile.artifact_only ` |
2931
30- ** Standard SDLC suites** use ` baseline-local-direct ` + ` mcp-remote-direct ` .
31- ** Artifact evaluation** uses ` baseline-local-artifact ` + ` mcp-remote-artifact `
32- (set via ` FULL_CONFIG=mcp-remote-artifact ` ).
32+ ** Standard SDLC suites** (` ccb_build ` , ` ccb_debug ` , etc.) use
33+ ` baseline-local-direct ` + ` mcp-remote-direct ` . The agent produces code changes
34+ and the verifier checks git diffs / test results.
35+
36+ ** MCP-unique suites** (` ccb_mcp_* ` ) use ` baseline-local-artifact ` +
37+ ` mcp-remote-artifact ` . These are retrieval/analysis tasks — the agent produces
38+ ` /workspace/answer.json ` and the verifier scores it against an oracle. Do NOT
39+ run MCP-unique suites with ` -direct ` configs; the verifier expects an artifact,
40+ not code changes.
41+
42+ ** SCIP ablation** uses ` mcp-scip-remote-direct ` or ` mcp-scip-remote-artifact `
43+ (requires branch swap pre-flight; see SCIP Ablation section below).
3344
3445### Legacy Names
3546
@@ -247,11 +258,116 @@ This flag is only meaningful when used with `--selection-file`.
247258
248259| Feature | Standard suites | MCP-unique suites |
249260| ---------| ----------------| -------------------|
261+ | ** Config pair** | ` baseline-local-direct ` + ` mcp-remote-direct ` | ` baseline-local-artifact ` + ` mcp-remote-artifact ` |
250262| Selection file | ` selected_benchmark_tasks.json ` | ` selected_mcp_unique_tasks.json ` |
251263| Suite prefix | ` ccb_<phase> ` | ` ccb_mcp_<category> ` |
264+ | Agent output | Code changes (git diff) | ` /workspace/answer.json ` |
252265| Verifier script | ` tests/test.sh ` | ` tests/eval.sh ` |
253266| Oracle format | task-specific | ` oracle_answer.json ` + ` oracle_checks.py ` |
254- | Local repo | full workspace | 1 local_checkout repo only |
255- | MCP-Full behavior | truncated source | no source clone |
267+ | Baseline Dockerfile | ` Dockerfile ` ( full repo clone) | ` Dockerfile ` (full repo clone) |
268+ | MCP Dockerfile | ` Dockerfile.sg_only ` ( truncated source) | ` Dockerfile.artifact_only ` (empty workspace) |
256269
257270See ` docs/MCP_UNIQUE_TASKS.md ` for full task authoring and evaluation details.
271+
272+ ## SCIP Precise Indexing Ablation
273+
274+ The ` mcp-scip-* ` configs measure the impact of SCIP precise code intelligence
275+ on MCP-enabled benchmark runs. SCIP provides compiler-accurate go-to-definition
276+ and find-references (vs search-based heuristics on the control branch).
277+
278+ ### How It Works
279+
280+ At the ** agent/Harbor level** , ` mcp-scip-remote-direct ` is identical to
281+ ` mcp-remote-direct ` — same Dockerfile, same MCP tools, same internal
282+ ` mcp_type=sourcegraph_full ` . The difference is purely ** server-side** : the
283+ Sourcegraph instance has SCIP auto-indexing enabled for one branch and disabled
284+ for another.
285+
286+ Two Sourcegraph configuration policies control indexing:
287+
288+ | Policy | Branch | ` indexingEnabled ` | ID |
289+ | --------| --------| -------------------| -----|
290+ | Benchmarks: Main (No SCIP) | ` main ` | ` false ` | ` ...MTA2Ng== ` |
291+ | Benchmarks: SCIP Enabled | ` scip-enabled ` | ` true ` | ` ...MTA2Nw== ` |
292+
293+ Both policies target ` github.com/sg-benchmarks/* ` with ` GIT_TREE ` type.
294+
295+ ### Deep Search Limitation
296+
297+ Deep Search only indexes the ** default branch HEAD** . It cannot be pointed at a
298+ specific branch. To ensure Deep Search uses the SCIP-indexed code, the default
299+ branch must be swapped before running benchmarks.
300+
301+ ### Pre-Flight: Branch Swap
302+
303+ Before running SCIP-enabled benchmarks, swap the default branch on all
304+ sg-benchmarks repos:
305+
306+ ``` bash
307+ # Before SCIP runs (mcp-scip-remote-direct):
308+ ./scripts/swap_default_branch.sh scip-enabled
309+ # Wait for Sourcegraph to re-index (~30-60 min for full org)
310+
311+ # Before control runs (mcp-remote-direct) or to restore:
312+ ./scripts/swap_default_branch.sh main
313+ ```
314+
315+ The swap script:
316+ - Patches all 1,592 sg-benchmarks repos via GitHub API (` --parallel 10 ` )
317+ - Skips repos already set to the target branch
318+ - Skips empty repos without the target branch
319+ - Logs results to ` /tmp/scip_branch_swap/ `
320+ - Supports ` --dry-run ` for previewing
321+
322+ ### Running the Ablation
323+
324+ ``` bash
325+ # 1. Swap to SCIP-enabled
326+ ./scripts/swap_default_branch.sh scip-enabled
327+ # 2. Wait for indexing to complete
328+ # 3. Run SCIP config
329+ FULL_CONFIG=mcp-scip-remote-direct configs/run_selected_tasks.sh
330+
331+ # 4. Swap back to control
332+ ./scripts/swap_default_branch.sh main
333+ # 5. Wait for re-index
334+ # 6. Run standard MCP config
335+ FULL_CONFIG=mcp-remote-direct configs/run_selected_tasks.sh
336+ ```
337+
338+ ### Comparing Results
339+
340+ Use ` compare_configs.py ` with both config names to see where SCIP helps/hurts:
341+
342+ ``` bash
343+ python3 scripts/compare_configs.py --run < run_dir> \
344+ --configs mcp-remote-direct mcp-scip-remote-direct
345+ ```
346+
347+ ### SCIP Indexing Coverage
348+
349+ Sourcegraph auto-indexing detects languages and runs the appropriate SCIP
350+ indexer per repo:
351+
352+ | Language | Indexer | Example repos |
353+ | ----------| ---------| ---------------|
354+ | Python | ` scip-python ` | ansible, django, astropy |
355+ | Go | ` scip-go ` | cilium, autoscaler, argo-cd |
356+ | TypeScript/JS | ` scip-typescript ` | vscode, cal.com, copilot-arena |
357+ | Java | ` scip-java ` | camel |
358+ | C++ | ` scip-clang ` | bustub, curl, log4cxx |
359+ | C# | ` scip-dotnet ` | aspnetcore, CodeCoverageSummary |
360+
361+ Not all repos may successfully index (complex build setups). Check indexing
362+ status in the Sourcegraph admin UI after swapping branches.
363+
364+ ### Branch Creation Script
365+
366+ If new repos are added to sg-benchmarks, create ` scip-enabled ` branches:
367+
368+ ``` bash
369+ ./scripts/create_scip_branches.sh [--dry-run] [--parallel N]
370+ ```
371+
372+ This creates a ` scip-enabled ` branch pointing to the same commit as ` main ` HEAD
373+ for all repos in the org. Empty repos are skipped.
0 commit comments