Skip to content

Commit ed9c827

Browse files
sjarmakclaude
andcommitted
feat: add 28 large-repo SR-QA tasks to break RepoQA ceiling saturation
Original 10 RepoQA SR-QA tasks scored 1.0/1.0 on both baseline and MCP configs (ceiling saturation). These 14 new function-search tasks target repos with 1M-35M LOC across 6 languages (Go, Java, Rust, C++, Python, TypeScript) to create genuine difficulty separation. Tasks added to ccb_understand (SDLC paired) and ccb_mcp_onboarding (MCP-unique) categories. Total task count: 251 -> 279. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2a1394e commit ed9c827

File tree

172 files changed

+9108
-6
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

172 files changed

+9108
-6
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
FROM python:3.11-slim
2+
RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*
3+
RUN pip install --no-cache-dir numpy
4+
WORKDIR /app
5+
RUN mkdir -p /logs/agent /logs/verifier
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# RepoQA: Semantic Retrieval (SR-QA)
2+
3+
## Task: Find the Function
4+
5+
You are searching a large codebase for a specific function based on its behavior.
6+
7+
**Repository**: kubernetes/kubernetes
8+
**Language**: go
9+
10+
## Function Description
11+
12+
```
13+
1. **Purpose**: Identifies which cluster nodes satisfy all scheduling filter plugins for a given workload, enabling the scheduler to narrow down placement candidates.
14+
2. **Input**: Takes a context, a framework handle providing filter plugins and parallelism settings, cycle state, a pod specification, a diagnosis collector for recording filter failures, and a pre-fetched list of all node information objects.
15+
3. **Output**: Returns a slice of node-info objects representing feasible placement targets, plus an error if any filter plugin returned a fatal error. Also populates the diagnosis object with per-node failure reasons as a side effect.
16+
4. **Procedure**:
17+
- Computes the target number of feasible nodes to find, reducing to 1 if there are no extender filters and no scoring plugins.
18+
- If no filter plugins are registered, returns the first N nodes starting from a round-robin offset.
19+
- Otherwise, defines an inner closure that runs all filter plugins against each node in parallel, starting from the last scheduling cycle's offset to ensure fairness.
20+
- Uses atomic counters to track how many feasible nodes have been found; cancels the parallel search early once the target count is reached.
21+
- Records non-feasible node statuses into a result array under the parallel check, then copies them into the diagnosis object after all parallel work completes.
22+
- Measures and reports the total Filter extension point latency via deferred metrics emission.
23+
```
24+
25+
## Search Strategy
26+
27+
This function **cannot be found by searching for its name** because the name is not provided. You must:
28+
29+
1. **Understand the behavior** described above
30+
2. **Search the codebase** to find functions matching this behavior
31+
3. **Explore the code** using call graphs and references
32+
4. **Narrow down** candidates until you find the exact function
33+
34+
35+
## Output Format
36+
37+
You MUST provide your answer as valid JSON and **SAVE IT TO A FILE**:
38+
39+
```json
40+
{
41+
"function_path": "path/to/file.ext",
42+
"function_name": "the_function_name",
43+
"justification": "Why this function matches: describe the behavior you found"
44+
}
45+
```
46+
47+
**CRITICAL**: You MUST save the JSON to `/app/solution.json`. This location is required for verification.
48+
49+
**Your final step MUST be to run this exact bash command:**
50+
51+
```bash
52+
cat > /app/solution.json << 'JSONEOF'
53+
{
54+
"function_path": "ACTUAL_PATH",
55+
"function_name": "ACTUAL_NAME",
56+
"justification": "ACTUAL_JUSTIFICATION_TEXT"
57+
}
58+
JSONEOF
59+
```
60+
61+
## Notes
62+
63+
- The file path should be relative to repository root
64+
- Function names are case-sensitive
65+
- Provide your best match even if uncertain; explain your reasoning
66+
- The justification is scored on how well it explains the function's behavior
67+
68+
## Scoring
69+
70+
- **Perfect** (1.0): Correct path AND name
71+
- **Good** (0.7-0.9): Correct path, similar name OR vice versa
72+
- **Partial** (0.3-0.6): Close approximation
73+
- **Incorrect** (0.0): Wrong function entirely
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
version = "1.0"
2+
3+
[metadata]
4+
name = "ccx-onboard-search-201"
5+
description = "Find a function in kubernetes/kubernetes (4M+ LOC) from a behavioral description"
6+
difficulty = "hard"
7+
category = "semantic-code-navigation"
8+
tags = ["ccb_mcp_onboarding", "go", "sr-qa", "large-repo", "repoqa"]
9+
language = "go"
10+
11+
[task]
12+
id = "ccx-onboard-search-201"
13+
repo = "kubernetes/kubernetes"
14+
category = "ccb_mcp_onboarding"
15+
language = "go"
16+
difficulty = "hard"
17+
time_limit_sec = 1200
18+
19+
[verification]
20+
type = "test"
21+
command = "bash /tests/test.sh"
22+
reward_type = "semantic_similarity"
23+
description = "Correct function retrieval similarity score"
24+
25+
[environment]
26+
build_timeout_sec = 1800.0
27+
cpus = 2
28+
memory = "4G"
29+
storage = "10G"
30+
31+
[environment.setup_scripts]
32+
mcp_config = """#!/bin/bash
33+
if [ -n "$SOURCEGRAPH_ACCESS_TOKEN" ] && [ -n "$SOURCEGRAPH_URL" ]; then
34+
mkdir -p /root/.config/claude
35+
cat > /root/.config/claude/mcp.json << 'MCPEOF'
36+
{
37+
"mcpServers": {
38+
"sourcegraph": {
39+
"command": "npx",
40+
"args": ["-y", "@sourcegraph/mcp-server"],
41+
"env": {
42+
"SRC_ACCESS_TOKEN": "$SOURCEGRAPH_ACCESS_TOKEN",
43+
"SOURCEGRAPH_URL": "$SOURCEGRAPH_URL"
44+
}
45+
}
46+
}
47+
}
48+
MCPEOF
49+
echo "MCP configuration created"
50+
else
51+
echo "No Sourcegraph credentials provided, MCP disabled"
52+
fi
53+
exit 0
54+
"""
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"function_id": "pkg/scheduler/schedule_one.go::findNodesThatPassFilters",
3+
"canonical_path": "pkg/scheduler/schedule_one.go",
4+
"canonical_name": "findNodesThatPassFilters",
5+
"language": "go",
6+
"nl_description": "1. **Purpose**: Identifies which cluster nodes satisfy all scheduling filter plugins for a given workload, enabling the scheduler to narrow down placement candidates.\n2. **Input**: Takes a context, a framework handle providing filter plugins and parallelism settings, cycle state, a pod specification, a diagnosis collector for recording filter failures, and a pre-fetched list of all node information objects.\n3. **Output**: Returns a slice of node-info objects representing feasible placement targets, plus an error if any filter plugin returned a fatal error. Also populates the diagnosis object with per-node failure reasons as a side effect.\n4. **Procedure**:\n - Computes the target number of feasible nodes to find, reducing to 1 if there are no extender filters and no scoring plugins.\n - If no filter plugins are registered, returns the first N nodes starting from a round-robin offset.\n - Otherwise, defines an inner closure that runs all filter plugins against each node in parallel, starting from the last scheduling cycle's offset to ensure fairness.\n - Uses atomic counters to track how many feasible nodes have been found; cancels the parallel search early once the target count is reached.\n - Records non-feasible node statuses into a result array under the parallel check, then copies them into the diagnosis object after all parallel work completes.\n - Measures and reports the total Filter extension point latency via deferred metrics emission.",
7+
"task_variant": "sr-qa"
8+
}
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
#!/bin/bash
2+
# RepoQA SR-QA Verification Script
3+
echo "Starting RepoQA verifier..." 1>&2
4+
cd /app || { echo "ERROR: Cannot cd to /app"; exit 1; }
5+
mkdir -p /logs/verifier
6+
7+
if [ ! -f /tests/ground_truth.json ]; then
8+
echo "ERROR: No ground_truth.json found at /tests/ground_truth.json"
9+
echo '{"score": 0.0}' > /logs/verifier/reward.json
10+
echo "0.0" > /logs/verifier/reward.txt
11+
exit 0
12+
fi
13+
14+
SOLUTION_FILE="/app/solution.json"
15+
if [ ! -f "$SOLUTION_FILE" ]; then
16+
echo "ERROR: Agent did not create solution.json in /app/"
17+
echo '{"score": 0.0}' > /logs/verifier/reward.json
18+
echo "0.0" > /logs/verifier/reward.txt
19+
exit 0
20+
fi
21+
22+
cat > /tmp/verify.py << 'PYEOF'
23+
import json, sys, re
24+
sys.path.insert(0, "/tests")
25+
from verifiers import SemanticRetrievalQAVerifier
26+
27+
try:
28+
with open("/tests/ground_truth.json") as f:
29+
ground_truth = json.load(f)
30+
with open("/app/solution.json") as f:
31+
raw = f.read()
32+
matches = re.findall(r"```(?:json)?\s*\n(.*?)```", raw, re.DOTALL)
33+
if matches:
34+
raw = matches[-1].strip()
35+
agent_output = json.loads(raw)
36+
37+
verifier = SemanticRetrievalQAVerifier(ground_truth)
38+
result = verifier.verify(agent_output)
39+
reward = {"score": float(result.correct_function)}
40+
41+
print(f"Correct Function: {result.correct_function:.2f}")
42+
print(f"Correct Path: {result.correct_path:.2f}")
43+
print(f"Justification: {result.justification_score:.2f}")
44+
print(f"Details: {result.reasoning}")
45+
46+
with open("/logs/verifier/reward.json", "w") as f:
47+
json.dump(reward, f, indent=2)
48+
with open("/logs/verifier/reward.txt", "w") as f:
49+
f.write(str(reward["score"]))
50+
except Exception as e:
51+
import traceback
52+
print(f"ERROR: {e}")
53+
traceback.print_exc()
54+
with open("/logs/verifier/reward.json", "w") as f:
55+
json.dump({"score": 0.0}, f)
56+
with open("/logs/verifier/reward.txt", "w") as f:
57+
f.write("0.0")
58+
PYEOF
59+
60+
python3 /tmp/verify.py 2>&1 | tee /logs/verifier/verify-debug.log
61+
exit 0
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
"""Verifiers for RepoQA SR-QA tasks. Scores agent function retrieval."""
2+
3+
import json
4+
import re
5+
from dataclasses import dataclass
6+
from difflib import SequenceMatcher
7+
from pathlib import Path
8+
from typing import Any, Dict
9+
10+
11+
@dataclass
12+
class VerificationResult:
13+
correct_function: float
14+
correct_path: float
15+
justification_score: float
16+
reasoning: str = ""
17+
18+
19+
class SemanticRetrievalQAVerifier:
20+
def __init__(self, ground_truth: Dict[str, Any]):
21+
self.ground_truth = ground_truth
22+
23+
def verify(self, agent_output: Dict[str, Any]) -> VerificationResult:
24+
try:
25+
path = agent_output.get("function_path", "")
26+
name = agent_output.get("function_name", "")
27+
justification = agent_output.get("justification", "")
28+
except (KeyError, TypeError) as e:
29+
return VerificationResult(0.0, 0.0, 0.0, f"Invalid output: {e}")
30+
31+
canonical_path = self.ground_truth.get("canonical_path", "")
32+
canonical_name = self.ground_truth.get("canonical_name", "")
33+
nl_description = self.ground_truth.get("nl_description", "")
34+
35+
path_score = self._path_similarity(path, canonical_path)
36+
name_score = self._name_similarity(name, canonical_name)
37+
38+
if path_score == 1.0 and name_score == 1.0:
39+
function_score = 1.0
40+
elif path_score == 1.0 and name_score > 0.7:
41+
function_score = 0.8
42+
elif path_score > 0.8 and name_score == 1.0:
43+
function_score = 0.8
44+
elif path_score > 0.5 and name_score > 0.5:
45+
function_score = 0.3
46+
else:
47+
function_score = 0.0
48+
49+
justification_score = self._keyword_overlap(justification, nl_description)
50+
51+
reasoning = (
52+
f"Path match: {path_score:.2f} (expected {canonical_path})\n"
53+
f"Name match: {name_score:.2f} (expected {canonical_name})\n"
54+
f"Justification keywords: {justification_score:.2f}"
55+
)
56+
return VerificationResult(function_score, path_score, justification_score, reasoning)
57+
58+
@staticmethod
59+
def _path_similarity(p1: str, p2: str) -> float:
60+
p1, p2 = Path(p1).as_posix(), Path(p2).as_posix()
61+
return 1.0 if p1 == p2 else SequenceMatcher(None, p1, p2).ratio()
62+
63+
@staticmethod
64+
def _name_similarity(n1: str, n2: str) -> float:
65+
return 1.0 if n1 == n2 else SequenceMatcher(None, n1.lower(), n2.lower()).ratio()
66+
67+
@staticmethod
68+
def _keyword_overlap(text1: str, text2: str) -> float:
69+
if not text1 or not text2:
70+
return 0.0
71+
w1 = set(re.findall(r"\w+", text1.lower()))
72+
w2 = set(re.findall(r"\w+", text2.lower()))
73+
if not w1 or not w2:
74+
return 0.0
75+
return len(w1 & w2) / len(w1 | w2)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
FROM python:3.11-slim
2+
RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*
3+
RUN pip install --no-cache-dir numpy
4+
WORKDIR /app
5+
RUN mkdir -p /logs/agent /logs/verifier
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# RepoQA: Semantic Retrieval (SR-QA)
2+
3+
## Task: Find the Function
4+
5+
You are searching a large codebase for a specific function based on its behavior.
6+
7+
**Repository**: kubernetes/kubernetes
8+
**Language**: go
9+
10+
## Function Description
11+
12+
```
13+
1. **Purpose**: Performs a single synchronization cycle of the node eviction manager, evaluating current resource usage against configured thresholds and, if necessary, selecting and terminating one workload to relieve resource pressure.
14+
2. **Input**: Operates as a method on the eviction manager, receiving a context, a list of active workloads (pods), a function to retrieve resource usage statistics, and a function to check if a pod has been cleaned up. It implicitly reads node summary statistics from the summary provider.
15+
3. **Output**: Returns a slice of pods that were evicted during this cycle (at most one) and an error. As side effects, it updates internal state: the set of met thresholds, node condition timestamps, and observation history.
16+
4. **Procedure**:
17+
- Refreshes memory threshold notifiers from the latest statistics summary.
18+
- Computes signal observations (e.g., memory available, disk available) and determines which thresholds are currently met, both ignoring and respecting grace periods.
19+
- Tracks when each threshold was first observed and when each node condition was last observed, applying a transition period before declaring conditions active.
20+
- Filters thresholds to only those whose grace periods are fully met and whose stats have been updated since the last sync.
21+
- Checks for local storage eviction violations first (pod-level disk usage); if any pods are evicted there, returns early.
22+
- Sorts remaining thresholds by eviction priority, identifies the highest-priority reclaimable resource, and first attempts node-level reclamation (e.g., garbage-collecting images or containers).
23+
- If node-level reclamation is insufficient, ranks all active pods using a signal-specific ranking function, then iterates through ranked pods and evicts the first one that can be killed.
24+
```
25+
26+
## Search Strategy
27+
28+
This function **cannot be found by searching for its name** because the name is not provided. You must:
29+
30+
1. **Understand the behavior** described above
31+
2. **Search the codebase** to find functions matching this behavior
32+
3. **Explore the code** using call graphs and references
33+
4. **Narrow down** candidates until you find the exact function
34+
35+
36+
## Output Format
37+
38+
You MUST provide your answer as valid JSON and **SAVE IT TO A FILE**:
39+
40+
```json
41+
{
42+
"function_path": "path/to/file.ext",
43+
"function_name": "the_function_name",
44+
"justification": "Why this function matches: describe the behavior you found"
45+
}
46+
```
47+
48+
**CRITICAL**: You MUST save the JSON to `/app/solution.json`. This location is required for verification.
49+
50+
**Your final step MUST be to run this exact bash command:**
51+
52+
```bash
53+
cat > /app/solution.json << 'JSONEOF'
54+
{
55+
"function_path": "ACTUAL_PATH",
56+
"function_name": "ACTUAL_NAME",
57+
"justification": "ACTUAL_JUSTIFICATION_TEXT"
58+
}
59+
JSONEOF
60+
```
61+
62+
## Notes
63+
64+
- The file path should be relative to repository root
65+
- Function names are case-sensitive
66+
- Provide your best match even if uncertain; explain your reasoning
67+
- The justification is scored on how well it explains the function's behavior
68+
69+
## Scoring
70+
71+
- **Perfect** (1.0): Correct path AND name
72+
- **Good** (0.7-0.9): Correct path, similar name OR vice versa
73+
- **Partial** (0.3-0.6): Close approximation
74+
- **Incorrect** (0.0): Wrong function entirely

0 commit comments

Comments
 (0)