Skip to content

feat(autoresearch): add Layer 4 Save-as-CLI eval with zhihu/xhs coverage#741

Merged
jackwener merged 9 commits intomainfrom
feat/save-as-cli-eval
Apr 3, 2026
Merged

feat(autoresearch): add Layer 4 Save-as-CLI eval with zhihu/xhs coverage#741
jackwener merged 9 commits intomainfrom
feat/save-as-cli-eval

Conversation

@jackwener
Copy link
Copy Markdown
Owner

Summary

  • New eval-save.ts: tests full operate init → write adapter → operate verify pipeline
  • 20 tasks: 14 PUBLIC strategy + 6 COOKIE strategy (zhihu hot/search/question, xhs feed/search/note)
  • New save-reliability preset for autoresearch engine multi-round iteration
  • Fix: operate verify no longer hardcodes --limit 3 for adapters without limit arg
  • Rename sediment → save throughout for clarity

AutoResearch Results (5 iterations)

  • Baseline: 14/14 → Final: 20/20
  • 3 rounds KEEP (+6 tasks auto-generated by Claude Code)
  • 2 rounds timeout (Claude Code 180s limit)

Test plan

  • npx tsx autoresearch/eval-save.ts → 20/20
  • npx tsx autoresearch/commands/run.ts --preset save-reliability --iterations 5 completed
  • npm run build passes

- New eval-save.ts: tests full init → write → verify pipeline (14 tasks)
- 8 PUBLIC strategy tasks (httpbin, jsonplaceholder, HN, wiki, lobsters, devto)
- 6 COOKIE strategy tasks (zhihu hot/search/question, xhs feed/search/note)
- New save-reliability preset for autoresearch engine iteration
- Fix: operate verify no longer hardcodes --limit 3 for adapters without limit arg
- Rename sediment → save throughout
…upport

- Replace simple COOKIE tasks with 6 complex multi-step chains:
  - zhihu: hot+top-answer (6-step), search+question-stats (7-step), question+answers+related (8-step)
  - xhs: search+scroll+dedup (6-step), note+comments (7-step), explore+scroll+sort (8-step)
- Move complex adapter code to save-adapters/*.ts files (avoids JSON escape issues)
- eval-save.ts: support adapterFile field to read adapter from file
- Preset scope now includes skills/opencli-operate/SKILL.md for skill improvement
- All 20/20 tasks passing
@jackwener jackwener merged commit b1c0bcb into main Apr 3, 2026
11 checks passed
@jackwener jackwener deleted the feat/save-as-cli-eval branch April 3, 2026 17:27
just-buer pushed a commit to just-buer/opencli that referenced this pull request Apr 8, 2026
…age (jackwener#741)

* feat(autoresearch): add Layer 4 "Save as CLI" eval + fix operate verify

- New eval-save.ts: tests full init → write → verify pipeline (14 tasks)
- 8 PUBLIC strategy tasks (httpbin, jsonplaceholder, HN, wiki, lobsters, devto)
- 6 COOKIE strategy tasks (zhihu hot/search/question, xhs feed/search/note)
- New save-reliability preset for autoresearch engine iteration
- Fix: operate verify no longer hardcodes --limit 3 for adapters without limit arg
- Rename sediment → save throughout

* experiment(operate): 两个新任务都基于已有通过任务使用的同一 API,期望 pass_count 从 14 → 16。

* experiment(operate): Added 2 new tasks ( and ) that use the exact same APIs already proven to pass in exis

* experiment(operate): Both new tasks pass. The change adds 2 more  tasks ( and ) using the same proven API, i

* fix(autoresearch): rename SedimentTask → SaveTask, fix bracket indent, gitignore results.tsv

* refactor(autoresearch): complex multi-step save tasks + adapterFile support

- Replace simple COOKIE tasks with 6 complex multi-step chains:
  - zhihu: hot+top-answer (6-step), search+question-stats (7-step), question+answers+related (8-step)
  - xhs: search+scroll+dedup (6-step), note+comments (7-step), explore+scroll+sort (8-step)
- Move complex adapter code to save-adapters/*.ts files (avoids JSON escape issues)
- eval-save.ts: support adapterFile field to read adapter from file
- Preset scope now includes skills/opencli-operate/SKILL.md for skill improvement
- All 20/20 tasks passing

* experiment(save): add hn-best and hn-jobs tasks using proven Firebase API pattern, pass_count 20→22

* fix(autoresearch): increase Claude Code timeout 180s → 300s to reduce ETIMEDOUT failures

* experiment(save): add restcountries and nager-holidays tasks using stable public APIs, pass_count 22→24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant