Skip to content

Commit 891644f

Browse files
sjarmakclaude
andcommitted
feat: add Docker prebuild, config validation, and mandatory pre-flight gate
- Add validate_config_name() to _common.sh (strict whitelist, exit 1 on unknown) - Add confirm_launch() shared pre-flight gate (Docker, disk, tokens, interactive) - Remove --yes flag from run_selected_tasks.sh (agents must not bypass gate) - Add --skip-prebuild flag and wire ensure_base_images/prebuild_images into runner - Add check_dockerfile_variants() to block runs with missing Dockerfile.sg_only/artifact_only - Replace minimal confirmation with rich pre-flight summary (config pair, Dockerfile readiness, Docker status, account tokens, disk space, prebuild status) - Add Run Launch Policy to CLAUDE.md and AGENTS.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 3e78c12 commit 891644f

File tree

4 files changed

+269
-11
lines changed

4 files changed

+269
-11
lines changed

AGENTS.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,15 @@ per-task details. See `docs/MCP_UNIQUE_TASKS.md` for the MCP-unique extension.
3535
- Never run `git checkout -b` or `git switch -c`.
3636
- Commit directly to `main`. This avoids cross-session branch confusion when multiple agents work on the repo.
3737

38+
## Run Launch Policy
39+
- **Every `harbor run` invocation MUST be gated by interactive confirmation.**
40+
The user must see a pre-flight summary and press Enter before any benchmark
41+
task launches. There is no `--yes` or unattended mode.
42+
- Use `confirm_launch "description" "config" N` from `_common.sh` in one-off
43+
scripts. `run_selected_tasks.sh` has its own built-in pre-flight gate.
44+
- **Never write a script that calls `harbor run` without a confirmation gate.**
45+
- **Never pass `--yes` to `run_selected_tasks.sh`** — the flag has been removed.
46+
3847
## Typical Skill Routing
3948
Use these defaults unless there is a task-specific reason not to.
4049

@@ -109,7 +118,7 @@ python3 scripts/repo_health.py # repo health gate (before p
109118
```
110119

111120
## Script Entrypoints
112-
- `configs/_common.sh` - shared run infra (parallelism, token refresh, validation hooks)
121+
- `configs/_common.sh` - shared run infra (parallelism, token refresh, validation hooks, `confirm_launch()`, `validate_config_name()`)
113122
- `configs/sdlc_suite_2config.sh` - generic SDLC runner (used by phase wrappers)
114123
- `configs/{build,debug,design,document,fix,secure,test}_2config.sh` - thin SDLC phase wrappers
115124
- `configs/run_selected_tasks.sh` - unified runner from `selected_benchmark_tasks.json`

CLAUDE.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,15 @@ per-task details. See `docs/MCP_UNIQUE_TASKS.md` for the MCP-unique extension.
3535
- Never run `git checkout -b` or `git switch -c`.
3636
- Commit directly to `main`. This avoids cross-session branch confusion when multiple agents work on the repo.
3737

38+
## Run Launch Policy
39+
- **Every `harbor run` invocation MUST be gated by interactive confirmation.**
40+
The user must see a pre-flight summary and press Enter before any benchmark
41+
task launches. There is no `--yes` or unattended mode.
42+
- Use `confirm_launch "description" "config" N` from `_common.sh` in one-off
43+
scripts. `run_selected_tasks.sh` has its own built-in pre-flight gate.
44+
- **Never write a script that calls `harbor run` without a confirmation gate.**
45+
- **Never pass `--yes` to `run_selected_tasks.sh`** — the flag has been removed.
46+
3847
## Typical Skill Routing
3948
Use these defaults unless there is a task-specific reason not to.
4049

@@ -109,7 +118,7 @@ python3 scripts/repo_health.py # repo health gate (before p
109118
```
110119

111120
## Script Entrypoints
112-
- `configs/_common.sh` - shared run infra (parallelism, token refresh, validation hooks)
121+
- `configs/_common.sh` - shared run infra (parallelism, token refresh, validation hooks, `confirm_launch()`, `validate_config_name()`)
113122
- `configs/sdlc_suite_2config.sh` - generic SDLC runner (used by phase wrappers)
114123
- `configs/{build,debug,design,document,fix,secure,test}_2config.sh` - thin SDLC phase wrappers
115124
- `configs/run_selected_tasks.sh` - unified runner from `selected_benchmark_tasks.json`

configs/_common.sh

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,95 @@ baseline_config_for() {
8888
esac
8989
}
9090

91+
# Validate a config name against the known whitelist.
92+
# Exits 1 with error message if unknown. Call before config_to_mcp_type().
93+
validate_config_name() {
94+
local config_name="$1"
95+
case "$config_name" in
96+
baseline-local-direct|mcp-remote-direct|\
97+
baseline-local-artifact|mcp-remote-artifact|\
98+
baseline|sourcegraph_full|artifact_full|none)
99+
return 0 ;;
100+
*)
101+
echo "ERROR: Unknown config name: '$config_name'" >&2
102+
echo " Valid: baseline-local-direct, mcp-remote-direct, baseline-local-artifact, mcp-remote-artifact" >&2
103+
echo " Legacy: baseline, sourcegraph_full, artifact_full, none" >&2
104+
exit 1 ;;
105+
esac
106+
}
107+
108+
# ============================================
109+
# PRE-FLIGHT CONFIRMATION GATE
110+
# ============================================
111+
# Shared pre-flight check for any script that launches harbor runs.
112+
# Shows config, Docker status, disk, and tokens, then requires interactive
113+
# confirmation. MUST be called before any harbor run invocation.
114+
#
115+
# Usage: confirm_launch "description" "config_name" [n_tasks]
116+
# $1 = short description (e.g., "MCP rerun: 3 SWE-Perf tasks")
117+
# $2 = config name (e.g., "mcp-remote-artifact")
118+
# $3 = number of tasks (default: 1)
119+
#
120+
# Exits 1 on Docker failure or low disk. Always requires Enter to proceed.
121+
confirm_launch() {
122+
local description="${1:-Harbor run}"
123+
local config_name="${2:-unknown}"
124+
local n_tasks="${3:-1}"
125+
126+
echo "----------------------------------------------"
127+
echo "PRE-FLIGHT: $description"
128+
echo "----------------------------------------------"
129+
echo "Config: $config_name"
130+
echo "Tasks: $n_tasks"
131+
132+
# Docker daemon
133+
if timeout 10 docker info >/dev/null 2>&1; then
134+
echo "Docker: OK"
135+
else
136+
echo "Docker: FAIL — daemon not responding"
137+
exit 1
138+
fi
139+
140+
# Disk space
141+
local _repo_root
142+
_repo_root="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
143+
local _disk_free
144+
_disk_free=$(df -BG --output=avail "$_repo_root" 2>/dev/null | tail -1 | tr -d ' G')
145+
if [ -n "$_disk_free" ] && [ "$_disk_free" -lt 5 ]; then
146+
echo "Disk space: FAIL — only ${_disk_free}GB free"
147+
exit 1
148+
elif [ -n "$_disk_free" ] && [ "$_disk_free" -lt 20 ]; then
149+
echo "Disk space: WARN — ${_disk_free}GB free"
150+
else
151+
echo "Disk space: OK (${_disk_free:-?}GB free)"
152+
fi
153+
154+
# Token freshness (if multi-account is set up)
155+
if [ "${#CLAUDE_HOMES[@]}" -gt 0 ] 2>/dev/null; then
156+
echo "Accounts: ${#CLAUDE_HOMES[@]} active"
157+
for _home_dir in "${CLAUDE_HOMES[@]}"; do
158+
local _creds="${_home_dir}/.claude/.credentials.json"
159+
if [ -f "$_creds" ]; then
160+
local _remaining
161+
_remaining=$(python3 -c "
162+
import json, time, sys
163+
try:
164+
d = json.load(open(sys.argv[1]))
165+
exp = d.get('claudeAiOauth',{}).get('expiresAt',0)
166+
rem = int((exp - time.time()*1000) / 60000)
167+
print(f'{rem} min remaining')
168+
except: print('unknown')
169+
" "$_creds" 2>/dev/null)
170+
echo " $(basename "$_home_dir"): $_remaining"
171+
fi
172+
done
173+
fi
174+
175+
echo "----------------------------------------------"
176+
read -r -p "Press Enter to proceed, Ctrl+C to abort... " _
177+
echo ""
178+
}
179+
91180
# ============================================
92181
# VERIFIER DEBUG MODE
93182
# ============================================

configs/run_selected_tasks.sh

Lines changed: 160 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
# --category CATEGORY Run category (default: staging)
2626
# --skip-completed Skip tasks that already have result.json + task_metrics.json
2727
# --dry-run Print tasks without running
28-
# --yes Skip confirmation prompt (non-interactive mode)
28+
# --skip-prebuild Skip Docker image pre-build (use when images already cached)
2929
#
3030
# Prerequisites:
3131
# - configs/selected_benchmark_tasks.json in repo (or --selection-file path)
@@ -61,7 +61,7 @@ CATEGORY="${CATEGORY:-staging}"
6161
FULL_CONFIG="${FULL_CONFIG:-mcp-remote-direct}"
6262
DRY_RUN=false
6363
SKIP_COMPLETED=false
64-
YES=false
64+
SKIP_PREBUILD=false
6565
AGENT_PATH="agents.claude_baseline_agent:BaselineClaudeCodeAgent"
6666

6767
while [[ $# -gt 0 ]]; do
@@ -114,8 +114,8 @@ while [[ $# -gt 0 ]]; do
114114
DRY_RUN=true
115115
shift
116116
;;
117-
--yes)
118-
YES=true
117+
--skip-prebuild)
118+
SKIP_PREBUILD=true
119119
shift
120120
;;
121121
*)
@@ -151,6 +151,10 @@ BASELINE_CONFIG=$(baseline_config_for "$FULL_CONFIG")
151151
BL_MCP_TYPE=$(config_to_mcp_type "$BASELINE_CONFIG")
152152
FULL_MCP_TYPE=$(config_to_mcp_type "$FULL_CONFIG")
153153

154+
# Strict validation — exit immediately on unknown config names
155+
validate_config_name "$BASELINE_CONFIG"
156+
validate_config_name "$FULL_CONFIG"
157+
154158
# ============================================
155159
# EXTRACT TASKS FROM SELECTION FILE
156160
# ============================================
@@ -250,17 +254,164 @@ if [ "$DRY_RUN" = true ]; then
250254
echo " ... and $(( count - 5 )) more"
251255
fi
252256
done
257+
if [ "$SKIP_PREBUILD" = false ]; then
258+
echo ""
259+
echo "[DRY RUN] Would pre-build Docker images for: ${!BENCHMARK_COUNTS[*]}"
260+
fi
253261
exit 0
254262
fi
255263

256264
# ============================================
257-
# CONFIRMATION GATE
265+
# DOCKERFILE VARIANT CHECK
266+
# ============================================
267+
# Verify all tasks have the required Dockerfile variant for the chosen config
268+
# BEFORE asking the user to confirm.
269+
check_dockerfile_variants() {
270+
DOCKERFILE_MISSING_COUNT=0
271+
DOCKERFILE_READY_COUNT=0
272+
DOCKERFILE_WARNINGS=""
273+
274+
local _is_artifact=false
275+
[[ "$FULL_CONFIG" == *artifact* ]] && _is_artifact=true
276+
277+
for bm in $(echo "${!BENCHMARK_TASK_DIRS[@]}" | tr ' ' '\n' | sort); do
278+
while IFS= read -r task_path; do
279+
[ -z "$task_path" ] && continue
280+
local abs_path="$REPO_ROOT/$task_path"
281+
local task_id
282+
task_id=$(basename "$task_path")
283+
284+
# Baseline: needs environment/Dockerfile
285+
if [ "$RUN_BASELINE" = true ] && [ ! -f "${abs_path}/environment/Dockerfile" ]; then
286+
DOCKERFILE_WARNINGS+=" MISSING: ${task_id} — Dockerfile (baseline)"$'\n'
287+
DOCKERFILE_MISSING_COUNT=$(( DOCKERFILE_MISSING_COUNT + 1 ))
288+
fi
289+
290+
# Full/MCP: needs the variant Dockerfile
291+
if [ "$RUN_FULL" = true ]; then
292+
if [ "$_is_artifact" = true ]; then
293+
if [ ! -f "${abs_path}/environment/Dockerfile.artifact_only" ]; then
294+
DOCKERFILE_WARNINGS+=" MISSING: ${task_id} — Dockerfile.artifact_only"$'\n'
295+
DOCKERFILE_MISSING_COUNT=$(( DOCKERFILE_MISSING_COUNT + 1 ))
296+
else
297+
DOCKERFILE_READY_COUNT=$(( DOCKERFILE_READY_COUNT + 1 ))
298+
fi
299+
else
300+
if [ ! -f "${abs_path}/environment/Dockerfile.sg_only" ]; then
301+
DOCKERFILE_WARNINGS+=" MISSING: ${task_id} — Dockerfile.sg_only"$'\n'
302+
DOCKERFILE_MISSING_COUNT=$(( DOCKERFILE_MISSING_COUNT + 1 ))
303+
else
304+
DOCKERFILE_READY_COUNT=$(( DOCKERFILE_READY_COUNT + 1 ))
305+
fi
306+
fi
307+
fi
308+
done <<< "$(echo "${BENCHMARK_TASK_DIRS[$bm]}" | grep -v '^$')"
309+
done
310+
}
311+
312+
# ============================================
313+
# PRE-FLIGHT VERIFICATION
258314
# ============================================
259-
if [ "$YES" != true ]; then
260-
echo "----------------------------------------------"
261-
echo "Ready to launch $TOTAL_AGENT_RUNS agent runs ($PARALLEL_TASKS parallel)."
315+
echo "----------------------------------------------"
316+
echo "PRE-FLIGHT VERIFICATION"
317+
echo "----------------------------------------------"
318+
echo ""
319+
320+
# 1. Config pair
321+
echo "Config pair:"
322+
if [ "$RUN_BASELINE" = true ]; then
323+
echo " Baseline: $BASELINE_CONFIG (mcp_type=$BL_MCP_TYPE)"
324+
fi
325+
if [ "$RUN_FULL" = true ]; then
326+
echo " Full: $FULL_CONFIG (mcp_type=$FULL_MCP_TYPE)"
327+
fi
328+
echo ""
329+
330+
# 2. Dockerfile variant readiness
331+
check_dockerfile_variants
332+
if [ "$RUN_FULL" = true ]; then
333+
_variant_name="Dockerfile.sg_only"
334+
[[ "$FULL_CONFIG" == *artifact* ]] && _variant_name="Dockerfile.artifact_only"
335+
echo "Dockerfile variants ($_variant_name):"
336+
echo " Ready: $DOCKERFILE_READY_COUNT / $TOTAL_TASKS"
337+
if [ "$DOCKERFILE_MISSING_COUNT" -gt 0 ]; then
338+
echo " MISSING: $DOCKERFILE_MISSING_COUNT"
339+
echo -e "$DOCKERFILE_WARNINGS"
340+
fi
262341
echo ""
263-
read -r -p "Press Enter to proceed, Ctrl+C to abort... " _
342+
fi
343+
344+
# 3. Docker daemon
345+
if timeout 10 docker info >/dev/null 2>&1; then
346+
echo "Docker: OK"
347+
else
348+
echo "Docker: FAIL — daemon not responding"
349+
exit 1
350+
fi
351+
352+
# 4. Account token freshness
353+
if [ "${#CLAUDE_HOMES[@]}" -gt 0 ]; then
354+
echo "Accounts: ${#CLAUDE_HOMES[@]} active"
355+
for _home_dir in "${CLAUDE_HOMES[@]}"; do
356+
_creds="${_home_dir}/.claude/.credentials.json"
357+
if [ -f "$_creds" ]; then
358+
_remaining=$(python3 -c "
359+
import json, time, sys
360+
try:
361+
d = json.load(open(sys.argv[1]))
362+
exp = d.get('claudeAiOauth',{}).get('expiresAt',0)
363+
rem = int((exp - time.time()*1000) / 60000)
364+
print(f'{rem} min remaining')
365+
except: print('unknown')
366+
" "$_creds" 2>/dev/null)
367+
echo " $(basename "$_home_dir"): $_remaining"
368+
fi
369+
done
370+
else
371+
echo "Accounts: default (single account)"
372+
fi
373+
374+
# 5. Disk space
375+
_disk_free=$(df -BG --output=avail "$REPO_ROOT" 2>/dev/null | tail -1 | tr -d ' G')
376+
if [ -n "$_disk_free" ] && [ "$_disk_free" -lt 5 ]; then
377+
echo "Disk space: FAIL — only ${_disk_free}GB free"
378+
exit 1
379+
elif [ -n "$_disk_free" ] && [ "$_disk_free" -lt 20 ]; then
380+
echo "Disk space: WARN — ${_disk_free}GB free (may run low)"
381+
else
382+
echo "Disk space: OK (${_disk_free:-?}GB free)"
383+
fi
384+
385+
# 6. Prebuild status
386+
if [ "$SKIP_PREBUILD" = false ]; then
387+
echo "Prebuild: enabled (${!BENCHMARK_COUNTS[*]})"
388+
else
389+
echo "Prebuild: SKIPPED (--skip-prebuild)"
390+
fi
391+
echo ""
392+
393+
# 7. Critical blockers — exit before confirmation
394+
if [ "$DOCKERFILE_MISSING_COUNT" -gt 0 ]; then
395+
echo "BLOCKED: $DOCKERFILE_MISSING_COUNT task(s) missing required Dockerfile variant."
396+
echo "Fix: Run python3 scripts/generate_sgonly_dockerfiles.py for affected tasks."
397+
exit 1
398+
fi
399+
400+
echo "----------------------------------------------"
401+
echo "Ready to launch $TOTAL_AGENT_RUNS agent runs ($PARALLEL_TASKS parallel)."
402+
echo ""
403+
read -r -p "Press Enter to proceed, Ctrl+C to abort... " _
404+
echo ""
405+
406+
# ============================================
407+
# DOCKER IMAGE PRE-BUILD
408+
# ============================================
409+
if [ "$SKIP_PREBUILD" = false ]; then
410+
echo "=== Pre-building Docker images ==="
411+
ensure_base_images
412+
for bm in $(echo "${!BENCHMARK_COUNTS[@]}" | tr ' ' '\n' | sort); do
413+
prebuild_images "$bm"
414+
done
264415
echo ""
265416
fi
266417

0 commit comments

Comments
 (0)