fix: SVG className crash + viewport expansion + test suites by jackwener · Pull Request #733 · jackwener/OpenCLI

jackwener · 2026-04-03T10:49:42Z

Summary

Critical bug fix + AutoResearch test infrastructure.

Critical Fix: SVG className crash in dom-snapshot

isSearchElement() called el.className.toLowerCase() which crashes on SVG elements where className is SVGAnimatedString (not a string). This caused the entire DOM snapshot to fail and fall back to the basic accessibility tree on any page with SVG icons (Zhihu, Twitter, most React sites).

Fix: typeof el.className === 'string' ? el.className : el.className?.baseVal || ''

Impact on Zhihu hot page:

Before: 50 interactive elements (accessibility tree fallback), 1/30 hot links indexed
After: 597 interactive elements (proper DOM snapshot), 19/30 hot links indexed
E2E task: 25 turns → 14 turns (-44%), $0.75 → $0.37 (-51%)

Other Changes

viewportExpand: 800 → 2000 (covers ~3 screens instead of ~1)
DEBUG_SNAPSHOT env var for snapshot failure debugging
extractVerdict: rewrite with brace-counting JSON.parse (handles escaped quotes)
runCommand: include stderr in error output
V2EX test suite: 70 tasks (7 layers + edge cases + agent-style)
Zhihu test suite: 65 tasks (8 layers)
Combined eval runner + presets

Test plan

npm run build passes
Zhihu E2E: 14 turns (was 25)
V2EX E2E: 9-10 turns
10/10 complex E2E tasks pass

AutoResearch framework (Karpathy-style autonomous iteration): - engine.ts: 8-phase loop (review → modify → commit → verify → guard → decide → log) - config.ts: typed config + CLI parser + metric extraction - logger.ts: TSV append-only results log - commands/run.ts: main loop spawning Claude Code per iteration - commands/plan.ts: interactive config wizard - commands/fix.ts: auto-detect broken state, iteratively fix - commands/debug.ts: hypothesis-driven debugging for failing tasks V2EX test suite (5 layers, 40 tasks): - L1 Atomic (10): open, state, click, scroll, eval, back, wait - L2 Single Page (10): hot topics, node list, topic meta, pagination - L3 Multi-Step (10): click-read, navigate-node, tab-then-topic, pagination - L4 Write Ops (5): reply typing, favorite detection, form detection - L5 Complex Chain (5): cross-page collect, multi-node compare, full workflow Presets: operate-reliability, skill-quality, v2ex-reliability

- Fix v2ex-collect-hot-authors selector (pathname-based member link detection) - Fix v2ex-wait-text judge (accept "appeared") - Fix trailing commas in eval step strings - Add 20 harder tasks: state+click interaction + long chain workflows - Baseline: 60/60 across all layers

Knowledge-intensive Chinese Q&A site (React SPA, lazy loading, complex DOM): - L1 Atomic (10): open, state, title, url, scroll, tab, back, wait, keys, screenshot - L2 Feed (8): feed titles, hot list, metrics, tabs, authors, content types, avatar, search - L3 Question (8): title, meta, answer, votes, buttons, descriptions, answer count - L4 Navigation (8): hot→question, feed→question, author profile, search, topic, user, back - L5 Write (6): upvote/follow/comment/bookmark/write-answer/share button detection - L6 Chain (8): read-answer-author, author-profile, multi-hot, search-then-read, scroll-answers - L7 Search (6): basic, people, topic, click-result, filter, back - L8 Complex (6): full workflow, deep author chain, cross-question, search-read, 3-page, scroll-deep Key fixes during development: - Zhihu search page needs 5s+ wait (SPA lazy loading) - Back navigation goes to about:blank (daemon init page), fixed with direct navigate - User profile answers page needs 4s wait for content - Broader selectors needed (h2 a instead of specific class names)

…mple Round 1: Fix 2 remaining browse-tasks failures: - extract-npm-description: use generic <p> selector instead of class-based - nav-click-link-example: include URL in output (title is 'Example Domains', not 'IANA')

…r year/rating Round 2: IMDB page selectors were too specific (data-testid changed). Use generic h1 for title, link text match for year, broader class match for rating.

Round 3: Add 10 edge case tasks (5 V2EX + 5 Zhihu): - rapid-navigate: 3 consecutive opens - eval-after-click: verify URL changes after SPA click - scroll-and-extract: extract after deep scroll - structured extraction: multi-field JSON from dynamic content - lazy-load answers: scroll triggers more content Key finding: Zhihu SPA click() doesn't update location.pathname immediately. Use window.location.href = a.href for reliable navigation. V2EX: 65/65, Zhihu: 65/65, Browse: 59/59 = 189/189

… eval for interaction) Round 4-5: Add 5 tasks that test the actual agent workflow: - agent-click-first-topic: find topic index via data-opencli-ref - agent-type-search: type into search using state index - agent-click-navigate-back: click by ref, verify navigation - agent-state-has-interactive: verify state output format - agent-state-after-scroll: verify scroll position in state V2EX: 70/70 tasks

- eval-skill.ts: remove dead TASKS_FILE variable (skill-tasks.yaml never existed) - eval-skill.ts: rewrite extractVerdict to use brace-counting JSON.parse instead of regex (handles escaped quotes in explanation) - eval-browse.ts: include stderr in runCommand error output for debuggability

Critical bug: isSearchElement() called el.className.toLowerCase() which crashes on SVG elements where className is SVGAnimatedString (not a string). This caused the entire DOM snapshot to fail and fall back to the basic accessibility tree, losing ALL interactive element indices. Fix: use typeof check + baseVal fallback for SVG className. Also: - Increase viewportExpand from 800 to 2000 (covers ~3 screens) - Add DEBUG_SNAPSHOT env var for snapshot failure debugging Impact on Zhihu hot page: - Before: 50 interactive elements (accessibility tree fallback), 1/30 hot links indexed - After: 597 interactive elements (proper DOM snapshot), 19/30 hot links indexed

…r#733) * feat: AutoResearch framework + V2EX test suite (40 tasks) AutoResearch framework (Karpathy-style autonomous iteration): - engine.ts: 8-phase loop (review → modify → commit → verify → guard → decide → log) - config.ts: typed config + CLI parser + metric extraction - logger.ts: TSV append-only results log - commands/run.ts: main loop spawning Claude Code per iteration - commands/plan.ts: interactive config wizard - commands/fix.ts: auto-detect broken state, iteratively fix - commands/debug.ts: hypothesis-driven debugging for failing tasks V2EX test suite (5 layers, 40 tasks): - L1 Atomic (10): open, state, click, scroll, eval, back, wait - L2 Single Page (10): hot topics, node list, topic meta, pagination - L3 Multi-Step (10): click-read, navigate-node, tab-then-topic, pagination - L4 Write Ops (5): reply typing, favorite detection, form detection - L5 Complex Chain (5): cross-page collect, multi-node compare, full workflow Presets: operate-reliability, skill-quality, v2ex-reliability * test: V2EX test suite 60/60 — fix selectors, add harder tasks - Fix v2ex-collect-hot-authors selector (pathname-based member link detection) - Fix v2ex-wait-text judge (accept "appeared") - Fix trailing commas in eval step strings - Add 20 harder tasks: state+click interaction + long chain workflows - Baseline: 60/60 across all layers * feat: Zhihu test suite — 60 tasks across 8 layers, 60/60 passing Knowledge-intensive Chinese Q&A site (React SPA, lazy loading, complex DOM): - L1 Atomic (10): open, state, title, url, scroll, tab, back, wait, keys, screenshot - L2 Feed (8): feed titles, hot list, metrics, tabs, authors, content types, avatar, search - L3 Question (8): title, meta, answer, votes, buttons, descriptions, answer count - L4 Navigation (8): hot→question, feed→question, author profile, search, topic, user, back - L5 Write (6): upvote/follow/comment/bookmark/write-answer/share button detection - L6 Chain (8): read-answer-author, author-profile, multi-hot, search-then-read, scroll-answers - L7 Search (6): basic, people, topic, click-result, filter, back - L8 Complex (6): full workflow, deep author chain, cross-question, search-read, 3-page, scroll-deep Key fixes during development: - Zhihu search page needs 5s+ wait (SPA lazy loading) - Back navigation goes to about:blank (daemon init page), fixed with direct navigate - User profile answers page needs 4s wait for content - Broader selectors needed (h2 a instead of specific class names) * feat: combined eval-all runner + combined-reliability preset * experiment(operate): fix extract-npm-description + nav-click-link-example Round 1: Fix 2 remaining browse-tasks failures: - extract-npm-description: use generic <p> selector instead of class-based - nav-click-link-example: include URL in output (title is 'Example Domains', not 'IANA') * experiment(operate): fix bench-imdb-matrix — use broader selectors for year/rating Round 2: IMDB page selectors were too specific (data-testid changed). Use generic h1 for title, link text match for year, broader class match for rating. * experiment(operate): add edge cases + fix SPA navigation timing Round 3: Add 10 edge case tasks (5 V2EX + 5 Zhihu): - rapid-navigate: 3 consecutive opens - eval-after-click: verify URL changes after SPA click - scroll-and-extract: extract after deep scroll - structured extraction: multi-field JSON from dynamic content - lazy-load answers: scroll triggers more content Key finding: Zhihu SPA click() doesn't update location.pathname immediately. Use window.location.href = a.href for reliable navigation. V2EX: 65/65, Zhihu: 65/65, Browse: 59/59 = 189/189 * experiment(operate): add agent-style tasks using state+click+type (no eval for interaction) Round 4-5: Add 5 tasks that test the actual agent workflow: - agent-click-first-topic: find topic index via data-opencli-ref - agent-type-search: type into search using state index - agent-click-navigate-back: click by ref, verify navigation - agent-state-has-interactive: verify state output format - agent-state-after-scroll: verify scroll position in state V2EX: 70/70 tasks * fix: review fixes — extractVerdict, stderr, dead code - eval-skill.ts: remove dead TASKS_FILE variable (skill-tasks.yaml never existed) - eval-skill.ts: rewrite extractVerdict to use brace-counting JSON.parse instead of regex (handles escaped quotes in explanation) - eval-browse.ts: include stderr in runCommand error output for debuggability * fix: SVG className crash in dom-snapshot + viewport expansion Critical bug: isSearchElement() called el.className.toLowerCase() which crashes on SVG elements where className is SVGAnimatedString (not a string). This caused the entire DOM snapshot to fail and fall back to the basic accessibility tree, losing ALL interactive element indices. Fix: use typeof check + baseVal fallback for SVG className. Also: - Increase viewportExpand from 800 to 2000 (covers ~3 screens) - Add DEBUG_SNAPSHOT env var for snapshot failure debugging Impact on Zhihu hot page: - Before: 50 interactive elements (accessibility tree fallback), 1/30 hot links indexed - After: 597 interactive elements (proper DOM snapshot), 19/30 hot links indexed

jackwener added 10 commits April 3, 2026 17:09

feat: combined eval-all runner + combined-reliability preset

f200b80

experiment(operate): fix bench-imdb-matrix — use broader selectors fo…

45459cc

…r year/rating Round 2: IMDB page selectors were too specific (data-testid changed). Use generic h1 for title, link text match for year, broader class match for rating.

jackwener merged commit ff84d19 into main Apr 3, 2026
13 checks passed

jackwener deleted the feat/v2ex-autoresearch branch April 3, 2026 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: SVG className crash + viewport expansion + test suites#733

fix: SVG className crash + viewport expansion + test suites#733
jackwener merged 10 commits intomainfrom
feat/v2ex-autoresearch

jackwener commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jackwener commented Apr 3, 2026

Summary

Critical Fix: SVG className crash in dom-snapshot

Other Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant