Skip to content

Commit 7fb94b1

Browse files
sjarmakclaude
andcommitted
feat: scaffold 8 new MCP-unique benchmark tasks (12 → 20 total)
Add tasks for 4 categories: Migration (C), Incident (D), Compliance (F), Platform (J). Each task has oracle-curated answer files verified via Sourcegraph queries (gold=1.0, empty=0.0 for all 8). New tasks: - CCX-migration-025: numpy.distutils deprecation across python-ml-stack - CCX-migration-027: Express req.host deprecation in nodejs-web-stack - CCX-incident-034: Loki client retry/timeout config in grafana-observability - CCX-incident-037: etcd DialTimeout across kubernetes-ecosystem - CCX-compliance-051: TLS enforcement across prometheus-monitoring - CCX-compliance-057-ds: audit logging evidence bundle (Deep Search variant) - CCX-platform-094: CODEOWNERS infrastructure in grafana-observability - CCX-platform-100: deprecated API fields in kubernetes pkg/apis/ New suites: ccb_mcp_migration, ccb_mcp_compliance New repo set: prometheus-monitoring (prometheus, alertmanager, client_golang) Note: SG mirrors for version pinning still needed (tracked in beads). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e0c046c commit 7fb94b1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+6950
-8
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
FROM ubuntu:22.04
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
5+
# Base tools
6+
RUN apt-get update && apt-get install -y --no-install-recommends \
7+
git \
8+
ca-certificates \
9+
curl \
10+
python3 \
11+
golang-go \
12+
&& rm -rf /var/lib/apt/lists/*
13+
14+
WORKDIR /workspace
15+
16+
# Clone fixture repo (baseline has full local access)
17+
RUN git clone --depth 1 --branch v3.2.1 https://github.com/prometheus/prometheus /workspace/prometheus
18+
19+
# Initialize git identity for agent commits
20+
RUN git config --global user.email "agent@example.com" && \
21+
git config --global user.name "Agent" && \
22+
git config --global safe.directory '*'
23+
24+
# Create log directories
25+
RUN mkdir -p /logs/agent /logs/verifier
26+
27+
ENTRYPOINT []
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# CCX-compliance-051 — sg_only variant
2+
# No local repo clone — agent uses Sourcegraph MCP exclusively for code access.
3+
# The verifier restores the full repo from /repo_full/ before scoring.
4+
5+
FROM ubuntu:22.04
6+
7+
ENV DEBIAN_FRONTEND=noninteractive
8+
9+
RUN apt-get update && apt-get install -y --no-install-recommends \
10+
git \
11+
ca-certificates \
12+
python3 \
13+
curl \
14+
&& rm -rf /var/lib/apt/lists/*
15+
16+
WORKDIR /workspace
17+
18+
# Empty workspace — agent discovers code via MCP tools only
19+
RUN git init && \
20+
git config user.email "agent@example.com" && \
21+
git config user.name "Agent" && \
22+
git config --global safe.directory '*'
23+
24+
# Create log directories
25+
RUN mkdir -p /logs/agent /logs/verifier
26+
27+
# Mark sg_only mode — verifiers and eval scripts check this flag
28+
RUN touch /tmp/.sg_only_mode
29+
30+
ENTRYPOINT []
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Security Compliance Audit: TLS Configuration Across Prometheus Stack
2+
3+
## Your Task
4+
5+
For a security audit, prove that TLS is enforced on all external interfaces of the Prometheus monitoring stack. Find all Go source files in `prometheus/prometheus` that define, load, validate, or apply TLS configuration for: scrape targets, remote write/read endpoints, the web server, tracing exporters, and service discovery plugins.
6+
7+
**NOTE**: The canonical TLS config struct is defined in the `prometheus-common` library (available on Sourcegraph as `sourcegraph-testing/prometheus-common`). Include this definition file in your answer.
8+
9+
## Specific Files to Find
10+
11+
1. **TLS struct definition and factory function** (in prometheus-common)
12+
2. **Config embedding** — where TLS is wired into scrape/remote/tracing configs
13+
3. **Server-side TLS** — web server TLS setup
14+
4. **Client-side TLS** — outbound connections: remote write, tracing, scrape, service discovery
15+
5. **TLS validation** — promtool config validation
16+
17+
## Context
18+
19+
You are performing a compliance audit of the Prometheus monitoring stack. The goal is to verify that TLS is enforced on all external-facing interfaces. This requires tracing TLS configuration from its definition in the shared `prometheus-common` library through its embedding in Prometheus's own config, its application on the web server (server-side), its use in outbound connections (client-side), and its validation by the `promtool` CLI.
20+
21+
## Available Resources
22+
23+
The local `/workspace/` directory contains:
24+
- `prometheus/prometheus` at v3.2.1 → `/workspace/prometheus`
25+
26+
## Output Format
27+
28+
Create a file at `/workspace/answer.json` with your findings in the following structure:
29+
30+
```json
31+
{
32+
"files": [
33+
{"repo": "prometheus/prometheus", "path": "relative/path/to/file.go"},
34+
{"repo": "sourcegraph-testing/prometheus-common", "path": "relative/path/to/file.go"}
35+
],
36+
"text": "Narrative explanation of the TLS architecture across the Prometheus stack."
37+
}
38+
```
39+
40+
**Important**: Use `"prometheus/prometheus"` or `"sourcegraph-testing/prometheus-common"` for repo names. Strip `github.com/` prefix.
41+
**Note**: Sourcegraph MCP tools return repo names with a `github.com/` prefix (e.g., `github.com/prometheus/prometheus`). Strip this prefix in your answer.
42+
43+
Include only the `files` field. Your answer is evaluated against a closed-world oracle — completeness matters.
44+
45+
## Evaluation
46+
47+
Your answer will be scored on:
48+
- **File recall and precision**: Did you find all relevant TLS configuration files across both repos?
49+
- **Keyword presence**: Does your answer reference key TLS identifiers (TLSConfig, NewTLSConfig, ServeMultiple)?
50+
- **Provenance**: Does your answer cite the correct repos and key file paths?
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
version = "1.0"
2+
3+
[metadata]
4+
name = "CCX-compliance-051"
5+
description = "Security Compliance Audit: TLS Configuration Across Prometheus Stack"
6+
license = "Apache-2.0"
7+
8+
[task]
9+
id = "CCX-compliance-051"
10+
repo = "prometheus/prometheus"
11+
category = "compliance-tls-enforcement"
12+
language = "go"
13+
difficulty = "hard"
14+
time_limit_sec = 900
15+
mcp_suite = "ccb_mcp_compliance"
16+
use_case_id = 51
17+
repo_set_id = "prometheus-monitoring"
18+
mcp_unique = true
19+
20+
[verification]
21+
type = "test"
22+
command = "bash /tests/eval.sh"
23+
24+
reward_type = "score"
25+
description = "Security Compliance Audit: TLS Configuration Across Prometheus Stack"
26+
27+
[environment]
28+
build_timeout_sec = 600.0
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
#!/bin/bash
2+
# eval.sh — MCP-unique benchmark evaluator for CCX-compliance-051
3+
# Exit-code-first (SWE-Factory pattern):
4+
# exit 0 — agent produced useful output (composite score > 0)
5+
# exit 1 — total failure (composite score == 0 or missing answer)
6+
#
7+
# Writes /logs/verifier/reward.txt with the composite score [0.0, 1.0]
8+
9+
set -euo pipefail
10+
11+
TASK_ID="CCX-compliance-051"
12+
ANSWER_PATH="/workspace/answer.json"
13+
TASK_SPEC_PATH="/tests/task_spec.json"
14+
ORACLE_CHECKS="/tests/oracle_checks.py"
15+
REWARD_PATH="/logs/verifier/reward.txt"
16+
17+
mkdir -p /logs/verifier
18+
19+
echo "=== $TASK_ID evaluator ==="
20+
echo "Task spec: $TASK_SPEC_PATH"
21+
echo "Answer: $ANSWER_PATH"
22+
echo ""
23+
24+
# sg_only mode guard: restore full repo if verifier wrapper exists
25+
if [ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ]; then
26+
echo "sg_only mode: sourcing verifier wrapper..."
27+
source /tests/sgonly_verifier_wrapper.sh
28+
fi
29+
30+
# Verify answer file exists
31+
if [ ! -f "$ANSWER_PATH" ]; then
32+
echo "ERROR: answer.json not found at $ANSWER_PATH"
33+
echo "0.0" > "$REWARD_PATH"
34+
exit 1
35+
fi
36+
37+
# Validate answer is valid JSON
38+
if ! python3 -c "import json; json.load(open('$ANSWER_PATH'))" 2>/dev/null; then
39+
echo "ERROR: answer.json is not valid JSON"
40+
echo "0.0" > "$REWARD_PATH"
41+
exit 1
42+
fi
43+
44+
echo "answer.json found and valid JSON"
45+
46+
# Run oracle checks
47+
if [ ! -f "$ORACLE_CHECKS" ]; then
48+
echo "ERROR: oracle_checks.py not found at $ORACLE_CHECKS"
49+
echo "0.0" > "$REWARD_PATH"
50+
exit 1
51+
fi
52+
53+
echo "Running oracle checks..."
54+
SCORE=$(python3 "$ORACLE_CHECKS" --answer "$ANSWER_PATH" --spec "$TASK_SPEC_PATH" --verbose 2>&1 | tee /dev/stderr | tail -1) || true
55+
56+
# Validate score is a number
57+
if ! echo "$SCORE" | python3 -c "import sys; float(sys.stdin.read().strip())" 2>/dev/null; then
58+
echo "ERROR: oracle_checks.py did not return a valid score: $SCORE"
59+
echo "0.0" > "$REWARD_PATH"
60+
exit 1
61+
fi
62+
63+
echo ""
64+
echo "Composite score: $SCORE"
65+
echo "$SCORE" > "$REWARD_PATH"
66+
67+
# Exit based on score (SWE-Factory exit-code-first pattern)
68+
python3 -c "import sys; sys.exit(0 if float('$SCORE') > 0 else 1)"
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{
2+
"files": [
3+
{"repo": "sourcegraph-testing/prometheus-common", "path": "config/http_config.go"},
4+
{"repo": "prometheus/prometheus", "path": "config/config.go"},
5+
{"repo": "prometheus/prometheus", "path": "cmd/prometheus/main.go"},
6+
{"repo": "prometheus/prometheus", "path": "cmd/promtool/main.go"},
7+
{"repo": "prometheus/prometheus", "path": "web/web.go"},
8+
{"repo": "prometheus/prometheus", "path": "tracing/tracing.go"},
9+
{"repo": "prometheus/prometheus", "path": "storage/remote/client.go"},
10+
{"repo": "prometheus/prometheus", "path": "scrape/scrape.go"},
11+
{"repo": "prometheus/prometheus", "path": "storage/remote/write.go"},
12+
{"repo": "prometheus/prometheus", "path": "storage/remote/storage.go"},
13+
{"repo": "prometheus/prometheus", "path": "discovery/triton/triton.go"},
14+
{"repo": "prometheus/prometheus", "path": "discovery/openstack/openstack.go"}
15+
],
16+
"text": "The Prometheus TLS architecture spans 12 files across 2 repos, organized in 5 layers:\n\n1. **Definition** (prometheus-common): config/http_config.go defines the canonical TLSConfig struct and NewTLSConfig() factory function that creates tls.Config from user settings.\n\n2. **Configuration** (prometheus/prometheus): config/config.go embeds TLSConfig into ScrapeConfig, RemoteWriteConfig, RemoteReadConfig, and TracingConfig, making TLS configurable for all external interfaces.\n\n3. **Validation** (prometheus/prometheus): cmd/promtool/main.go validates TLS settings as part of config validation, ensuring certificates and keys are valid before deployment.\n\n4. **Server-side TLS** (prometheus/prometheus): web/web.go applies TLS via ServeMultiple() for the Prometheus HTTP server, and cmd/prometheus/main.go wires the TLS config into the web handler.\n\n5. **Client-side TLS** (prometheus/prometheus): storage/remote/client.go, storage/remote/write.go, and storage/remote/storage.go apply TLS for remote write/read connections. scrape/scrape.go uses TLS for scrape target connections. tracing/tracing.go configures TLS for OTLP tracing exporters. discovery/triton/triton.go and discovery/openstack/openstack.go apply TLS for service discovery HTTP clients.",
17+
"_metadata": {
18+
"oracle_type": "file_set_match",
19+
"discovery_method": "sourcegraph_keyword_search",
20+
"query": "repo:^github.com/prometheus/prometheus$ TLSConfig",
21+
"verified_at": "2026-02-21",
22+
"pinned_version": "v3.2.1"
23+
}
24+
}

0 commit comments

Comments
 (0)