fix(extract): parse .vue SFC <script> with the right grammar#1468
fix(extract): parse .vue SFC <script> with the right grammar#1468papinto wants to merge 1 commit into
Conversation
.vue files were routed to extract_js, which selects a grammar by file
suffix; .vue is neither .ts nor .tsx, so the whole SFC (template + script
+ style) was fed to the JavaScript grammar. Markup is not valid JS, so the
parse produced a top-level ERROR node and nothing was extracted.
On a 300-file sample of a real Vue 3 + Vite + TS app, 40% of .vue files
extracted nothing and 69% had no import edges.
Add extract_vue, which masks every region outside a <script> body to
whitespace (newlines preserved, so line numbers stay accurate) and parses
the surviving script with the grammar implied by its lang (ts/unset ->
TS, tsx -> TSX, js/jsx -> JS; TS is a superset of JS, a safe default). A
dynamic-import regex pass recovers import('...') lazy imports, mirroring
extract_svelte/extract_astro.
This goes beyond the Svelte/Astro import-rescue: it recovers the full
symbol and type-reference graph, not just imports. .vue files also join the
JS cross-file symbol-resolution pass (the prior pass excluded them because
the whole-file parse failed). On the same sample: empties 40% -> 6%,
no-imports 69% -> 8%, total edges 2,834 -> 8,882.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The same issue was similarly addressed in #1141 . Sadly the author didnt respond for 3 weeks. While yours seems to have better test coverage, @safishamsi suggested that reusing the svelte function is the right call, even asking to not create a distinct vue wrapper function! Consider reading their response on that PR to evaluate possibly changing this here. Hoping to see this merged soon 👍 |
.vue files were dispatched to extract_js, which picks a tree-sitter grammar by suffix. .vue is neither .ts nor .tsx, so the whole SFC -- <template> markup, <script>, and <style> -- was fed to the JavaScript grammar, producing a top-level ERROR node and recovering no imports, symbols, or type references. A dedicated extract_vue masks everything outside <script> (replacing it with spaces so symbol line numbers stay accurate) and parses just the script with the grammar named by `lang` (ts default; tsx/js/jsx honored). .vue also joins the cross-file symbol-resolution pass now that it parses cleanly. Ported from PR #1468 by @papinto. Maintainer fix on top: the <script> open-tag scan now skips over quoted attribute values, so a `>` inside one (Vue 3.3+ generic components, e.g. generic="T extends Record<string, unknown>") no longer ends the tag early and swallow the body. Added a regression test for that case. (CHANGELOG also records #1470, committed just prior.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @papinto — great catch and the right fix. Feeding the whole SFC to the JS grammar produced a top-level ERROR and dropped everything; masking the non- Landed on |
Dates the 0.8.50 CHANGELOG and bumps the version. Highlights: WPF/XAML extraction + ViewModel/binding links (#1460/#1473), Objective-C relationship fixes (#1475), .vue SFC grammar fix (#1468), Metal shader indexing (#1480), Java field/annotation references (#1485/#1487), portable wiki links (#1444), *_BASE_URL backend overrides (#1458), non-streaming OpenAI-compatible calls (#1223), reflect --if-stale sidecar freshness (#1470), label --missing-only (#1481), canvas grid + case-fold dedup (#1452/#1453), the Read|Glob hook extension fix (#1463), and the no-API-key skill clarification (#1461). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
.vueSingle File Components were extracting almost nothing because the whole file was being parsed with the wrong grammar..vueis dispatched toextract_js, which picks a tree-sitter grammar by file suffix. Since.vueis neither.tsnor.tsx, it fell through to the JavaScript grammar, and the entire SFC —<template>markup,<script>, and<style>— was handed to that grammar. Markup isn't valid JS, so the parse produced a top-level ERROR node and no imports, symbols, or type references were recovered.This adds a dedicated
extract_vuethat parses only the<script>block, with the grammar itslangimplies.Impact
Measured on a 300-file random sample from a production Vue 3 + Vite + TypeScript app:
The residual ~6% are genuinely template-only components (no
<script>), which correctly produce just a file node.How it works
<script>body — template, style, and the<script …>/</script>tags themselves — is replaced with a space, while newlines are preserved. The surviving script therefore sits at its original line numbers, and the blanked regions parse as empty. No line-offset bookkeeping required.lang.tsx→ TSX,js/jsx→ JS, andtsor unspecified → TS. TypeScript is a strict superset of JavaScript, so defaulting an unannotated block to the TS grammar is lossless and matches the dominant Vue 3 +<script setup>convention.import('...')calls (e.g.defineAsyncComponent(() => import('./X.vue'))) that the AST pass does not emit as edges — the same rescueextract_svelte/extract_astroalready do.This goes a step further than the Svelte/Astro extractors, which regex-scrape import strings only: because Vue's component logic lives in
<script>, parsing it with a real grammar recovers the full symbol and type-reference graph (classes, functions, typed props referencing imported types), not just imports..vuefiles also now join the JS cross-file symbol-resolution pass. They were previously excluded from it precisely because the whole-file parse failed; with a working per-file parse, an SFC's calls resolve to definitions in other files like any.tsfile's would.Implementation notes
_extract_genericgains an opt-in keyword-onlysource_override: bytes | None = None. When provided, it parses those bytes instead of reading the file, while still keying every node/edge off the real path. Default-off, so no existing caller changes behavior.and path.suffix != ".vue"guard in_collect_js_symbol_resolution_factsis removed..vuestays in the JS-family suffix set used elsewhere for cache-bypass.Tests
tests/test_vue_extraction.py(11 cases): masking + line-number preservation,<script setup lang="ts">static imports, symbol extraction with correct lines, typed props referencing an imported type, dual<script>+<script setup>blocks, dynamic-import recovery, plain-JS blocks, template-only files, and end-to-end cross-file call resolution.The JS/TS import-resolution, Svelte/Astro, and symbol-resolution suites remain green.