Skip to content

docs: align command list with shipped adapters#5

Closed
erixyuan wants to merge 1 commit intojackwener:mainfrom
erixyuan:main
Closed

docs: align command list with shipped adapters#5
erixyuan wants to merge 1 commit intojackwener:mainfrom
erixyuan:main

Conversation

@erixyuan
Copy link
Copy Markdown
Contributor

Summary

  • align README counts with the current shipped adapters (27 commands across 15 sites)
  • remove github command references that are not present in the current repo/runtime
  • update the public API examples in SKILL.md to use commands that actually exist

Verification

  • ran node dist/main.js list --json
  • confirmed the current runtime exposes 27 commands across 15 sites
  • confirmed there is no src/clis/github/ adapter directory in the repo

@jackwener
Copy link
Copy Markdown
Owner

感谢你的贡献!这个文档更新已经在 v0.2.0 版本的重构与文档统筹更新中被包含进去了。为了感谢你的付出,我已经将你设置为了相关 commit 的 Co-authored-by。🎉

@jackwener jackwener closed this Mar 15, 2026
jackwener added a commit that referenced this pull request Mar 15, 2026
- #1 Fix URL injection in subtitle.ts via JSON.stringify
- #2 Remove debug console.error from production code
- #3 Delete stale test_subtitle.ts
- #4 Add --lang option for multi-language subtitle selection
- #5 Fix duplicate comment numbering (two '// 4.')
- #6 Add clickLabels targeted clicking + --click flag to explore
- #7 Move empty-value penalty into scoreEndpoint() (affects filtering)
- #8 Add cascade request code template to CLI-CREATOR.md
jackwener added a commit that referenced this pull request Mar 19, 2026
Bug fixes:
- #1 /logs?level=error returned 404 — use pathname for route matching
- #2 Duplicate initialization — added 'initialized' guard flag

Should fix:
- #4 Added screenshot() to IPage interface
- #5 Graceful shutdown rejects pending requests before exit
- #6 Use process.execPath instead of 'npx tsx' for faster daemon spawn

Cleanup:
- #7 Removed duplicate 'browser' keyword in package.json
- #8 Removed unused normalizeEvaluateSource import from browser.ts
- #9 Changed dynamic import to static import in intercept.ts
- #10 Added explicit throw at end of sendCommand for clarity

61 tests pass (4 test files). Extension: 10.55KB.
jackwener added a commit that referenced this pull request Mar 30, 2026
…mpt, actions

Closes all high and medium priority gaps vs Browser Use:

Planning System (#1):
- PlanItem state machine (pending/current/done/skipped)
- LLM can output `plan` field to update/create plans
- Plan auto-advances on successful steps
- Replan nudge after 3 consecutive failures

Self-Evaluation (#3):
- New `evaluationPreviousGoal` field in AgentResponse
- Pre-done verification rules in system prompt (5-step checklist)
- `success` field on DoneAction for explicit failure signaling

Action System (#4):
- New actions: select_dropdown, switch_tab, open_tab, close_tab, search_page
- Auto-detect <select> and redirect to select_dropdown
- Element scroll (scroll within a specific element by index)
- Wait capped at 10s

Loop Detection (#5):
- SHA-256 hashed sliding window (15 steps)
- 3 severity tiers: mild (4x), strong (7x), critical (10x)
- Page fingerprint stall detection (URL + element count + DOM hash)

System Prompt (#6):
- Expanded from 65 to ~170 lines with structured sections
- Action chaining rules (page-changing vs safe)
- Reasoning pattern guidance
- Examples for evaluation, memory, planning

LLM Timeout (#7):
- Configurable `llmTimeout` (default 60s)
- Promise-based timeout wrapper

Message Compaction (#8):
- Builds structured summary of compacted messages
- Extracts URLs visited, goals achieved, past errors
- Maintains Anthropic API user/assistant alternation

AX Tree Enrichment (#9):
- Fetches accessibility role/name via CDP when available
- Enriches ElementInfo with axRole/axName
- Falls back to DOM attributes if CDP unavailable

Sensitive Data Masking (#10):
- Configurable sensitivePatterns map
- Applied to all user messages before LLM

Prompt Caching (#2):
- System prompt uses cache_control: ephemeral
- Last user message uses cache_control: ephemeral
- Token tracking includes cache_read and cache_creation

Screenshot Control (#11):
- Configurable maxScreenshotDim (default 1200px)
- Zero-size element filtering in DOM context
jackwener added a commit that referenced this pull request Mar 30, 2026
…tion, timeout

#1 AX tree: remove dead CDP calls (DOM.getDocument + Accessibility.getFullAXTree
   were called but axLookup never used). Replace with single batched evaluate()
   that reads ARIA attributes for up to 100 elements in one call.

#2 Loop detection: detectLoop() now uses only previously recorded state (no
   domContext param). Fixes off-by-one where current step wasn't yet recorded.

#3 Message compaction: prevent consecutive user messages by merging summary
   into preceding user message if roles collide, and skipping duplicate roles
   at the tail boundary.

#4 JS injection: all evaluate() calls now use JSON.stringify for user-controlled
   values (element indices, option text, scroll amounts) instead of template
   interpolation.

#5 updatePlan: moved after consecutiveErrors update so plan advancement uses
   current step's error state, not the previous step's.

#6 LLM timeout: pass AbortController signal to Anthropic SDK so timed-out
   requests are actually cancelled instead of continuing in the background.
jackwener added a commit that referenced this pull request Mar 31, 2026
…mpt, actions

Closes all high and medium priority gaps vs Browser Use:

Planning System (#1):
- PlanItem state machine (pending/current/done/skipped)
- LLM can output `plan` field to update/create plans
- Plan auto-advances on successful steps
- Replan nudge after 3 consecutive failures

Self-Evaluation (#3):
- New `evaluationPreviousGoal` field in AgentResponse
- Pre-done verification rules in system prompt (5-step checklist)
- `success` field on DoneAction for explicit failure signaling

Action System (#4):
- New actions: select_dropdown, switch_tab, open_tab, close_tab, search_page
- Auto-detect <select> and redirect to select_dropdown
- Element scroll (scroll within a specific element by index)
- Wait capped at 10s

Loop Detection (#5):
- SHA-256 hashed sliding window (15 steps)
- 3 severity tiers: mild (4x), strong (7x), critical (10x)
- Page fingerprint stall detection (URL + element count + DOM hash)

System Prompt (#6):
- Expanded from 65 to ~170 lines with structured sections
- Action chaining rules (page-changing vs safe)
- Reasoning pattern guidance
- Examples for evaluation, memory, planning

LLM Timeout (#7):
- Configurable `llmTimeout` (default 60s)
- Promise-based timeout wrapper

Message Compaction (#8):
- Builds structured summary of compacted messages
- Extracts URLs visited, goals achieved, past errors
- Maintains Anthropic API user/assistant alternation

AX Tree Enrichment (#9):
- Fetches accessibility role/name via CDP when available
- Enriches ElementInfo with axRole/axName
- Falls back to DOM attributes if CDP unavailable

Sensitive Data Masking (#10):
- Configurable sensitivePatterns map
- Applied to all user messages before LLM

Prompt Caching (#2):
- System prompt uses cache_control: ephemeral
- Last user message uses cache_control: ephemeral
- Token tracking includes cache_read and cache_creation

Screenshot Control (#11):
- Configurable maxScreenshotDim (default 1200px)
- Zero-size element filtering in DOM context
jackwener added a commit that referenced this pull request Mar 31, 2026
…tion, timeout

#1 AX tree: remove dead CDP calls (DOM.getDocument + Accessibility.getFullAXTree
   were called but axLookup never used). Replace with single batched evaluate()
   that reads ARIA attributes for up to 100 elements in one call.

#2 Loop detection: detectLoop() now uses only previously recorded state (no
   domContext param). Fixes off-by-one where current step wasn't yet recorded.

#3 Message compaction: prevent consecutive user messages by merging summary
   into preceding user message if roles collide, and skipping duplicate roles
   at the tail boundary.

#4 JS injection: all evaluate() calls now use JSON.stringify for user-controlled
   values (element indices, option text, scroll amounts) instead of template
   interpolation.

#5 updatePlan: moved after consecutiveErrors update so plan advancement uses
   current step's error state, not the previous step's.

#6 LLM timeout: pass AbortController signal to Anthropic SDK so timed-out
   requests are actually cancelled instead of continuing in the background.
jackwener added a commit that referenced this pull request Apr 2, 2026
…e turns

- Add Rule #7: minimize total tool calls (3-5 per task, not 15-20)
- Strengthen Rule #5: chain aggressively with &&
- Add explicit good/bad chaining examples
- Add click+wait+state chaining pattern
- Add type+verify chaining pattern

Before: 21 turns for complex V2EX reply task
After: 12 turns for same task (-43% turns, -28% cost)
jackwener added a commit that referenced this pull request Apr 3, 2026
…timization) (#717)

* feat: AutoResearch framework + V2EX test suite (40 tasks)

AutoResearch framework (Karpathy-style autonomous iteration):
- engine.ts: 8-phase loop (review → modify → commit → verify → guard → decide → log)
- config.ts: typed config + CLI parser + metric extraction
- logger.ts: TSV append-only results log
- commands/run.ts: main loop spawning Claude Code per iteration
- commands/plan.ts: interactive config wizard
- commands/fix.ts: auto-detect broken state, iteratively fix
- commands/debug.ts: hypothesis-driven debugging for failing tasks

V2EX test suite (5 layers, 40 tasks):
- L1 Atomic (10): open, state, click, scroll, eval, back, wait
- L2 Single Page (10): hot topics, node list, topic meta, pagination
- L3 Multi-Step (10): click-read, navigate-node, tab-then-topic, pagination
- L4 Write Ops (5): reply typing, favorite detection, form detection
- L5 Complex Chain (5): cross-page collect, multi-node compare, full workflow

Presets: operate-reliability, skill-quality, v2ex-reliability

* test: V2EX test suite 60/60 — fix selectors, add harder tasks

- Fix v2ex-collect-hot-authors selector (pathname-based member link detection)
- Fix v2ex-wait-text judge (accept "appeared")
- Fix trailing commas in eval step strings
- Add 20 harder tasks: state+click interaction + long chain workflows
- Baseline: 60/60 across all layers

* docs: optimize SKILL.md for efficiency — aggressive chaining, minimize turns

- Add Rule #7: minimize total tool calls (3-5 per task, not 15-20)
- Strengthen Rule #5: chain aggressively with &&
- Add explicit good/bad chaining examples
- Add click+wait+state chaining pattern
- Add type+verify chaining pattern

Before: 21 turns for complex V2EX reply task
After: 12 turns for same task (-43% turns, -28% cost)
just-buer pushed a commit to just-buer/opencli that referenced this pull request Apr 8, 2026
…timization) (jackwener#717)

* feat: AutoResearch framework + V2EX test suite (40 tasks)

AutoResearch framework (Karpathy-style autonomous iteration):
- engine.ts: 8-phase loop (review → modify → commit → verify → guard → decide → log)
- config.ts: typed config + CLI parser + metric extraction
- logger.ts: TSV append-only results log
- commands/run.ts: main loop spawning Claude Code per iteration
- commands/plan.ts: interactive config wizard
- commands/fix.ts: auto-detect broken state, iteratively fix
- commands/debug.ts: hypothesis-driven debugging for failing tasks

V2EX test suite (5 layers, 40 tasks):
- L1 Atomic (10): open, state, click, scroll, eval, back, wait
- L2 Single Page (10): hot topics, node list, topic meta, pagination
- L3 Multi-Step (10): click-read, navigate-node, tab-then-topic, pagination
- L4 Write Ops (5): reply typing, favorite detection, form detection
- L5 Complex Chain (5): cross-page collect, multi-node compare, full workflow

Presets: operate-reliability, skill-quality, v2ex-reliability

* test: V2EX test suite 60/60 — fix selectors, add harder tasks

- Fix v2ex-collect-hot-authors selector (pathname-based member link detection)
- Fix v2ex-wait-text judge (accept "appeared")
- Fix trailing commas in eval step strings
- Add 20 harder tasks: state+click interaction + long chain workflows
- Baseline: 60/60 across all layers

* docs: optimize SKILL.md for efficiency — aggressive chaining, minimize turns

- Add Rule jackwener#7: minimize total tool calls (3-5 per task, not 15-20)
- Strengthen Rule jackwener#5: chain aggressively with &&
- Add explicit good/bad chaining examples
- Add click+wait+state chaining pattern
- Add type+verify chaining pattern

Before: 21 turns for complex V2EX reply task
After: 12 turns for same task (-43% turns, -28% cost)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants