Add GP earning benchmark (5-loop iterative) by classicrob · Pull Request #37 · MaxBittker/rs-sdk

classicrob · 2026-02-19T00:32:33Z

Summary

New Harbor benchmark task gp-10k-ticks: agents earn as much gold as possible across 5 iterative loops
Each loop spawns a fresh sub-agent with no memory — only learnings.md and gp_results.json carry forward
Each loop gets 5 bots (level 50 all skills, Lumbridge, 0 coins) with a 10,000 tick limit per script
Adds gemini-flash (gemini-3-flash-preview), codex53 (gpt-5.3-codex), and kimi models to run.sh

New files

benchmark/shared/gp_loop_instruction.md — per-loop instruction for sub-agents
benchmark/shared/generate_gp_saves.ts — generates 25 save files (5 bots × 5 loops)
benchmark/shared/check_gp.ts — verifier that reads per-loop GP results + verifies inventory

Test plan

bun benchmark/generate-tasks.ts generates gp-10k-ticks/ with correct instruction, Dockerfile, and verifier
Run with benchmark/run.sh -t gp-10k-ticks -m gemini-flash -m codex53

🤖 Generated with Claude Code

New Harbor benchmark task where agents earn as much gold as possible. 5 loops with fresh sub-agents per loop, learnings.md as the handoff document between loops. Each loop gets 5 bots (level 50 all skills) and 10,000 game ticks per script. No pickpocketing. Also adds gemini-flash, gpt-5.3-codex, and kimi models to run.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add GP earning benchmark (5-loop iterative)#37

Add GP earning benchmark (5-loop iterative)#37
classicrob wants to merge 1 commit intoMaxBittker:mainfrom
classicrob:gp-benchmark

classicrob commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

classicrob commented Feb 19, 2026

Summary

New files

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant