feat: add `pascal` optional extra for tree-sitter-pascal by vinicius-l-machado · Pull Request #1616 · Graphify-Labs/graphify

vinicius-l-machado · 2026-07-02T19:06:25Z

extract_pascal() already imports tree-sitter-pascal for AST-quality extraction and falls back to a regex extractor when it is absent (#781), but the grammar was not declared anywhere in the package metadata, so it was never installed and the AST path never ran out of the box.

Declare a pascal extra (and add it to all) so users can opt into the AST extractor with uv tool install "graphifyy[pascal]". tree-sitter-pascal publishes prebuilt wheels for every platform (win/macOS/Linux), so unlike the dm extra it needs no C toolchain.

On a mid-size Delphi codebase the AST path yields notably more accurate relationship edges than the regex fallback (calls and inherits both up ~25%). README extras table and uv.lock updated accordingly.

extract_pascal() already imports tree-sitter-pascal for AST-quality extraction and falls back to a regex extractor when it is absent (Graphify-Labs#781), but the grammar was not declared anywhere in the package metadata, so it was never installed and the AST path never ran out of the box. Declare a `pascal` extra (and add it to `all`) so users can opt into the AST extractor with `uv tool install "graphifyy[pascal]"`. tree-sitter-pascal publishes prebuilt wheels for every platform (win/macOS/Linux), so unlike the `dm` extra it needs no C toolchain. On a mid-size Delphi codebase the AST path yields notably more accurate relationship edges than the regex fallback (calls and inherits both up ~25%). README extras table and uv.lock updated accordingly.

…iling (Graphify-Labs#1616) `graphify explain "<phrase>"` treats its whole argument as one string that must match/prefix/substring a single node's label as a whole — so a genuine natural-language phrase (e.g. "critic score aggregation") returns "No node matching found" even when every individual word exists on a real, relevant node, because no node label ever literally contains the entire multi-word phrase. This silently dead-ends on exactly the query shape `explain` is otherwise suggested for, with no fallback and no signal that anything went wrong (worse than noise: a hard, silent zero). When the tiered lookup finds nothing and the phrase has more than one token, `explain` now falls back to the same per-token bag-of-words scoring `query` already uses (`_score_nodes`) and lists the top candidates by term overlap, in the same numbered-candidate format the existing ambiguity guard (Graphify-Labs#1613) uses, instead of a bare dead end. A genuine single-word miss is unaffected — gated on token count, since a one-word probe would score identically to the substring tier already tried and has nothing new to find. Regression tests: multi-word phrase with real term overlap surfaces candidates and excludes unrelated nodes; multi-word phrase with zero overlap still gets the honest original message; single-word miss is byte-identical to prior behavior. Full suite (2766 tests, 1 pre-existing unrelated failure) and ruff pass. Verified live against a real repo's graph.json: both previously-zero `explain` queries now surface their real target (`ratingsAggregation.ts`, `backdrops.handler.ts`) instead of nothing.

…ify-Labs#1618) Graphify-Labs#1616's term-overlap fallback (this same session) fixed `explain` hard- failing to zero on multi-word natural-language phrases, but has its own failure mode: when a query's only shared vocabulary with the corpus is one generic word, every node containing that word ties at the weakest possible bonus tier, and the fallback presents an arbitrary top-10 slice of that tie as though it were a considered answer. Live repro: "server startup error handling" matched 1,765 of this repo's 3,491 nodes (51%) — "server" is also this repo's top-level backend directory name — with the real target buried past rank 800, tied with 1,627 other nodes at the exact same floor score. That's not a useful answer, it's close to a coin flip dressed up as one. Fix: after scoring, if the candidate count exceeds both an absolute floor (50) and 15% of the graph's total node count, treat it as a noise flood and fall back to the same honest zero-match message a genuine miss gets, instead of printing a misleadingly specific candidate list. The floor keeps this from firing on small graphs/fixtures, where even "most of the graph matched" can be a small, legitimate list. Genuine large-but-real candidate lists (e.g. 31 candidates on this repo's ~3,491-node graph, an earlier fix's verified-good case) stay well under the threshold and are unaffected. Regression tests: a 60-of-61-node noise flood on one generic token now gets the honest no-match message; a 20-of-21-node case (below this graph size's threshold) still shows its candidate list normally, confirming the guard is for degenerate floods specifically, not just "more than 10 results." Full suite (2769 tests, all passing this run — the one known pre-existing test-order flake did not trigger) and ruff pass. Verified live: the exact 1,765-candidate flood from earlier now returns the honest no-match message; smaller legitimate fallbacks (critic score aggregation, backdrop image selection) are unaffected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add `pascal` optional extra for tree-sitter-pascal#1616

feat: add `pascal` optional extra for tree-sitter-pascal#1616
vinicius-l-machado wants to merge 1 commit into
Graphify-Labs:v8from
vinicius-l-machado:feat/pascal-delphi-extractor

vinicius-l-machado commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

vinicius-l-machado commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant