feat(autoresearch): add Layer 4 Save-as-CLI eval with zhihu/xhs coverage by jackwener · Pull Request #741 · jackwener/OpenCLI

jackwener · 2026-04-03T16:25:37Z

Summary

New eval-save.ts: tests full operate init → write adapter → operate verify pipeline
20 tasks: 14 PUBLIC strategy + 6 COOKIE strategy (zhihu hot/search/question, xhs feed/search/note)
New save-reliability preset for autoresearch engine multi-round iteration
Fix: operate verify no longer hardcodes --limit 3 for adapters without limit arg
Rename sediment → save throughout for clarity

AutoResearch Results (5 iterations)

Baseline: 14/14 → Final: 20/20
3 rounds KEEP (+6 tasks auto-generated by Claude Code)
2 rounds timeout (Claude Code 180s limit)

Test plan

npx tsx autoresearch/eval-save.ts → 20/20
npx tsx autoresearch/commands/run.ts --preset save-reliability --iterations 5 completed
npm run build passes

- New eval-save.ts: tests full init → write → verify pipeline (14 tasks) - 8 PUBLIC strategy tasks (httpbin, jsonplaceholder, HN, wiki, lobsters, devto) - 6 COOKIE strategy tasks (zhihu hot/search/question, xhs feed/search/note) - New save-reliability preset for autoresearch engine iteration - Fix: operate verify no longer hardcodes --limit 3 for adapters without limit arg - Rename sediment → save throughout

…e APIs already proven to pass in exis

…s ( and ) using the same proven API, i

…, gitignore results.tsv

…upport - Replace simple COOKIE tasks with 6 complex multi-step chains: - zhihu: hot+top-answer (6-step), search+question-stats (7-step), question+answers+related (8-step) - xhs: search+scroll+dedup (6-step), note+comments (7-step), explore+scroll+sort (8-step) - Move complex adapter code to save-adapters/*.ts files (avoids JSON escape issues) - eval-save.ts: support adapterFile field to read adapter from file - Preset scope now includes skills/opencli-operate/SKILL.md for skill improvement - All 20/20 tasks passing

… API pattern, pass_count 20→22

… ETIMEDOUT failures

…able public APIs, pass_count 22→24

…age (jackwener#741) * feat(autoresearch): add Layer 4 "Save as CLI" eval + fix operate verify - New eval-save.ts: tests full init → write → verify pipeline (14 tasks) - 8 PUBLIC strategy tasks (httpbin, jsonplaceholder, HN, wiki, lobsters, devto) - 6 COOKIE strategy tasks (zhihu hot/search/question, xhs feed/search/note) - New save-reliability preset for autoresearch engine iteration - Fix: operate verify no longer hardcodes --limit 3 for adapters without limit arg - Rename sediment → save throughout * experiment(operate): 两个新任务都基于已有通过任务使用的同一 API，期望 pass_count 从 14 → 16。 * experiment(operate): Added 2 new tasks ( and ) that use the exact same APIs already proven to pass in exis * experiment(operate): Both new tasks pass. The change adds 2 more tasks ( and ) using the same proven API, i * fix(autoresearch): rename SedimentTask → SaveTask, fix bracket indent, gitignore results.tsv * refactor(autoresearch): complex multi-step save tasks + adapterFile support - Replace simple COOKIE tasks with 6 complex multi-step chains: - zhihu: hot+top-answer (6-step), search+question-stats (7-step), question+answers+related (8-step) - xhs: search+scroll+dedup (6-step), note+comments (7-step), explore+scroll+sort (8-step) - Move complex adapter code to save-adapters/*.ts files (avoids JSON escape issues) - eval-save.ts: support adapterFile field to read adapter from file - Preset scope now includes skills/opencli-operate/SKILL.md for skill improvement - All 20/20 tasks passing * experiment(save): add hn-best and hn-jobs tasks using proven Firebase API pattern, pass_count 20→22 * fix(autoresearch): increase Claude Code timeout 180s → 300s to reduce ETIMEDOUT failures * experiment(save): add restcountries and nager-holidays tasks using stable public APIs, pass_count 22→24

jackwener added 9 commits April 4, 2026 00:01

experiment(operate): 两个新任务都基于已有通过任务使用的同一 API，期望 pass_count 从 14 → 16。

bb1e2aa

experiment(operate): Added 2 new tasks ( and ) that use the exact sam…

583cfcb

…e APIs already proven to pass in exis

experiment(operate): Both new tasks pass. The change adds 2 more task…

23828d6

…s ( and ) using the same proven API, i

fix(autoresearch): rename SedimentTask → SaveTask, fix bracket indent…

b299a15

…, gitignore results.tsv

experiment(save): add hn-best and hn-jobs tasks using proven Firebase…

d34444a

… API pattern, pass_count 20→22

fix(autoresearch): increase Claude Code timeout 180s → 300s to reduce…

e83d3d0

… ETIMEDOUT failures

experiment(save): add restcountries and nager-holidays tasks using st…

9f8a419

…able public APIs, pass_count 22→24

jackwener merged commit b1c0bcb into main Apr 3, 2026
11 checks passed

jackwener deleted the feat/save-as-cli-eval branch April 3, 2026 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(autoresearch): add Layer 4 Save-as-CLI eval with zhihu/xhs coverage#741

feat(autoresearch): add Layer 4 Save-as-CLI eval with zhihu/xhs coverage#741
jackwener merged 9 commits intomainfrom
feat/save-as-cli-eval

jackwener commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jackwener commented Apr 3, 2026

Summary

AutoResearch Results (5 iterations)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant