Skip to content

fix(extract): parse .vue SFC <script> with the right grammar#1468

Closed
papinto wants to merge 1 commit into
Graphify-Labs:v8from
papinto:fix/vue-sfc-script-slicing
Closed

fix(extract): parse .vue SFC <script> with the right grammar#1468
papinto wants to merge 1 commit into
Graphify-Labs:v8from
papinto:fix/vue-sfc-script-slicing

Conversation

@papinto

@papinto papinto commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

.vue Single File Components were extracting almost nothing because the whole file was being parsed with the wrong grammar.

.vue is dispatched to extract_js, which picks a tree-sitter grammar by file suffix. Since .vue is neither .ts nor .tsx, it fell through to the JavaScript grammar, and the entire SFC — <template> markup, <script>, and <style> — was handed to that grammar. Markup isn't valid JS, so the parse produced a top-level ERROR node and no imports, symbols, or type references were recovered.

This adds a dedicated extract_vue that parses only the <script> block, with the grammar its lang implies.

Impact

Measured on a 300-file random sample from a production Vue 3 + Vite + TypeScript app:

Metric Before After
Files extracting nothing (0 edges) 40% 6%
Files with no import edges 69% 8%
Total edges 2,834 8,882 (3.1×)

The residual ~6% are genuinely template-only components (no <script>), which correctly produce just a file node.

How it works

  1. Mask, don't strip. Every character outside a <script> body — template, style, and the <script …>/</script> tags themselves — is replaced with a space, while newlines are preserved. The surviving script therefore sits at its original line numbers, and the blanked regions parse as empty. No line-offset bookkeeping required.
  2. Pick the grammar from lang. tsx → TSX, js/jsx → JS, and ts or unspecified → TS. TypeScript is a strict superset of JavaScript, so defaulting an unannotated block to the TS grammar is lossless and matches the dominant Vue 3 + <script setup> convention.
  3. Recover dynamic imports. A small regex pass picks up import('...') calls (e.g. defineAsyncComponent(() => import('./X.vue'))) that the AST pass does not emit as edges — the same rescue extract_svelte/extract_astro already do.

This goes a step further than the Svelte/Astro extractors, which regex-scrape import strings only: because Vue's component logic lives in <script>, parsing it with a real grammar recovers the full symbol and type-reference graph (classes, functions, typed props referencing imported types), not just imports.

.vue files also now join the JS cross-file symbol-resolution pass. They were previously excluded from it precisely because the whole-file parse failed; with a working per-file parse, an SFC's calls resolve to definitions in other files like any .ts file's would.

Implementation notes

  • _extract_generic gains an opt-in keyword-only source_override: bytes | None = None. When provided, it parses those bytes instead of reading the file, while still keying every node/edge off the real path. Default-off, so no existing caller changes behavior.
  • The one-site and path.suffix != ".vue" guard in _collect_js_symbol_resolution_facts is removed. .vue stays in the JS-family suffix set used elsewhere for cache-bypass.

Tests

tests/test_vue_extraction.py (11 cases): masking + line-number preservation, <script setup lang="ts"> static imports, symbol extraction with correct lines, typed props referencing an imported type, dual <script> + <script setup> blocks, dynamic-import recovery, plain-JS blocks, template-only files, and end-to-end cross-file call resolution.

The JS/TS import-resolution, Svelte/Astro, and symbol-resolution suites remain green.

.vue files were routed to extract_js, which selects a grammar by file
suffix; .vue is neither .ts nor .tsx, so the whole SFC (template + script
+ style) was fed to the JavaScript grammar. Markup is not valid JS, so the
parse produced a top-level ERROR node and nothing was extracted.

On a 300-file sample of a real Vue 3 + Vite + TS app, 40% of .vue files
extracted nothing and 69% had no import edges.

Add extract_vue, which masks every region outside a <script> body to
whitespace (newlines preserved, so line numbers stay accurate) and parses
the surviving script with the grammar implied by its lang (ts/unset ->
TS, tsx -> TSX, js/jsx -> JS; TS is a superset of JS, a safe default). A
dynamic-import regex pass recovers import('...') lazy imports, mirroring
extract_svelte/extract_astro.

This goes beyond the Svelte/Astro import-rescue: it recovers the full
symbol and type-reference graph, not just imports. .vue files also join the
JS cross-file symbol-resolution pass (the prior pass excluded them because
the whole-file parse failed). On the same sample: empties 40% -> 6%,
no-imports 69% -> 8%, total edges 2,834 -> 8,882.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@MentorFilou

MentorFilou commented Jun 26, 2026

Copy link
Copy Markdown

The same issue was similarly addressed in #1141 . Sadly the author didnt respond for 3 weeks.

While yours seems to have better test coverage, @safishamsi suggested that reusing the svelte function is the right call, even asking to not create a distinct vue wrapper function! Consider reading their response on that PR to evaluate possibly changing this here.

Hoping to see this merged soon 👍

safishamsi pushed a commit that referenced this pull request Jun 26, 2026
.vue files were dispatched to extract_js, which picks a tree-sitter grammar by
suffix. .vue is neither .ts nor .tsx, so the whole SFC -- <template> markup,
<script>, and <style> -- was fed to the JavaScript grammar, producing a top-level
ERROR node and recovering no imports, symbols, or type references.

A dedicated extract_vue masks everything outside <script> (replacing it with
spaces so symbol line numbers stay accurate) and parses just the script with the
grammar named by `lang` (ts default; tsx/js/jsx honored). .vue also joins the
cross-file symbol-resolution pass now that it parses cleanly.

Ported from PR #1468 by @papinto. Maintainer fix on top: the <script> open-tag
scan now skips over quoted attribute values, so a `>` inside one (Vue 3.3+ generic
components, e.g. generic="T extends Record<string, unknown>") no longer ends the
tag early and swallow the body. Added a regression test for that case.

(CHANGELOG also records #1470, committed just prior.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@safishamsi

Copy link
Copy Markdown
Collaborator

Thanks @papinto — great catch and the right fix. Feeding the whole SFC to the JS grammar produced a top-level ERROR and dropped everything; masking the non-<script> regions with spaces (preserving newlines so line numbers stay accurate) and parsing the script with the lang-named grammar is exactly the correct approach, and wiring .vue into cross-file symbol resolution is a nice bonus.

Landed on v8 in 349465b with your authorship preserved. One maintainer fix on top: the <script> open-tag scan now skips over quoted attribute values, so a > inside one — Vue 3.3+ generic components like generic="T extends Record<string, unknown>" — no longer ends the tag early and swallows the body. Added a regression test for it. Verified end-to-end: a <script setup lang="ts"> SFC now recovers its imports and symbols with no ERROR node. Closing as merged-by-port — thanks for the contribution!

@safishamsi safishamsi closed this Jun 26, 2026
safishamsi added a commit that referenced this pull request Jun 27, 2026
Dates the 0.8.50 CHANGELOG and bumps the version. Highlights: WPF/XAML extraction
+ ViewModel/binding links (#1460/#1473), Objective-C relationship fixes (#1475),
.vue SFC grammar fix (#1468), Metal shader indexing (#1480), Java field/annotation
references (#1485/#1487), portable wiki links (#1444), *_BASE_URL backend overrides
(#1458), non-streaming OpenAI-compatible calls (#1223), reflect --if-stale sidecar
freshness (#1470), label --missing-only (#1481), canvas grid + case-fold dedup
(#1452/#1453), the Read|Glob hook extension fix (#1463), and the no-API-key skill
clarification (#1461).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants