Skip to content

feat(autoresearch): improve operate success rate + complex publish chains#753

Merged
jackwener merged 3 commits intomainfrom
feat/save-as-cli-eval
Apr 4, 2026
Merged

feat(autoresearch): improve operate success rate + complex publish chains#753
jackwener merged 3 commits intomainfrom
feat/save-as-cli-eval

Conversation

@jackwener
Copy link
Copy Markdown
Owner

Summary

  • Add Layer 5 Publish testing for twitter/zhihu with 15 tasks (fill-only + publish+delete)
  • Fix 8 broken browse task selectors (50/59 → 56-58/59)
  • Add 8 complex multi-step publish chains (thread compose, quote RT, cross-platform)
  • Improve skills/opencli-operate/SKILL.md: Common Pitfalls section, selector fallback guidance, save-as-CLI workflow docs

Test Results

Layer Before After
Browse (Operate) 50/59 (85%) 56-58/59 (95-98%)
Save as CLI 26/26 (100%) 26/26 (100%)
Publish fill-only 5/5 12-13/13

Key changes

  • autoresearch/eval-publish.ts: publish test harness (fill-only + publish+cleanup)
  • autoresearch/publish-tasks.json: 15 tasks covering twitter/zhihu/cross-platform
  • autoresearch/browse-tasks.json: fix 8 broken selectors
  • skills/opencli-operate/SKILL.md: Common Pitfalls, selector best practices, wait variants

Test plan

  • npx tsx autoresearch/eval-browse.ts → 56-58/59
  • npx tsx autoresearch/eval-save.ts → 26/26
  • npx tsx autoresearch/eval-publish.ts --type fill-only → 12-13/13

New eval-publish.ts tests end-to-end content creation via operate commands:
- 7 tasks: 5 fill-only (safe) + 2 publish (post + delete)
- Twitter: compose fill, reply fill, post+delete, cross-site HN→tweet
- Zhihu: answer fill, article fill (title+body), cross-site HN→answer
- Supports --type fill-only/publish and --platform twitter/zhihu filters
- Cleanup steps auto-delete published content after verification
- fill-only: 5/5 passing
…ains

Iteration round 1 results:
- Browse: 50/59 → 58/59 (+8) — fixed 8 broken selectors, 1 remaining (DDG images anti-crawl)
- Publish fill-only: 5/5 → 12/13 → 13/13 — added 8 complex tasks, fixed selectors
- Save as CLI: 26/26 (maintained)

Changes:
- browse-tasks.json: fix 8 broken selectors (iana, github, quotes, trending, google, wiki, npm, httpbin)
- publish-tasks.json: add 8 complex multi-step tasks (thread compose, quote RT, search→reply, cross-platform)
- skills/opencli-operate/SKILL.md: add Common Pitfalls section, improve save-as-CLI guidance
- Fix twitter thread compose (use querySelectorAll for 2nd textarea)
- Fix zhihu editor selectors (WriteIndex-titleInput, contenteditable)
@jackwener jackwener merged commit a39a858 into main Apr 4, 2026
9 checks passed
just-buer pushed a commit to just-buer/opencli that referenced this pull request Apr 8, 2026
…ains (jackwener#753)

* chore(autoresearch): format save-tasks.json

* feat(autoresearch): add Layer 5 Publish testing for twitter/zhihu

New eval-publish.ts tests end-to-end content creation via operate commands:
- 7 tasks: 5 fill-only (safe) + 2 publish (post + delete)
- Twitter: compose fill, reply fill, post+delete, cross-site HN→tweet
- Zhihu: answer fill, article fill (title+body), cross-site HN→answer
- Supports --type fill-only/publish and --platform twitter/zhihu filters
- Cleanup steps auto-delete published content after verification
- fill-only: 5/5 passing

* feat(autoresearch): improve operate success rate + complex publish chains

Iteration round 1 results:
- Browse: 50/59 → 58/59 (+8) — fixed 8 broken selectors, 1 remaining (DDG images anti-crawl)
- Publish fill-only: 5/5 → 12/13 → 13/13 — added 8 complex tasks, fixed selectors
- Save as CLI: 26/26 (maintained)

Changes:
- browse-tasks.json: fix 8 broken selectors (iana, github, quotes, trending, google, wiki, npm, httpbin)
- publish-tasks.json: add 8 complex multi-step tasks (thread compose, quote RT, search→reply, cross-platform)
- skills/opencli-operate/SKILL.md: add Common Pitfalls section, improve save-as-CLI guidance
- Fix twitter thread compose (use querySelectorAll for 2nd textarea)
- Fix zhihu editor selectors (WriteIndex-titleInput, contenteditable)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant