Recover missing C function usages + faster LSP enrichment#223
Merged
Conversation
The function-definition query only matched a bare function_declarator, so a pointer-return signature (robj *streamDup(), static inline list *f(), robj **g()) — where tree-sitter nests the function_declarator under one or two pointer_declarators — produced no node at all. Call sites inside such a function's body then had no enclosing function and their call edges were dropped too. Match function_definition broadly and derive the name by peeling the declarator chain (reusing cDeclName), and add pointer-declarator alternations to the C prototype and C++ free-function queries. Fix C++ extractFuncName to descend pointer/reference/parenthesized declarator wrappers so pointer-return methods are no longer dropped.
…erences A generated C fragment such as redis's src/commands.def — 12k lines of MAKE_CMD(...) rows #include'd into a translation unit — was never parsed (.def had no extractor), so the only reference to a command function lived in an unindexed file and find_usages reported the function as unused. Parse .def as C (uncontested extension) and content-route a clearly-C .inc fragment to the C extractor without disturbing the PHP/Pascal/asm claimants. Capture a bare function identifier used in a table value position — a macro or call argument, an aggregate initializer element, a designated .field initializer — as an ungated function-value candidate attributed to the file node, so the resolver gate binds it to the uniquely-named command function in its defining translation unit. Flood is held down by value-position gating plus dropping every same-file declaration, parameter, and local.
A function used by address in another translation unit — a function-pointer identity check (c->cmd->proc != execCommand), a function-pointer assignment (*slot = execCommand), a return of a function, or &fn — returned zero usages because same-file value capture drops any name the file does not declare, and C addresses functions across a flat extern namespace. Capture these in-function value positions as ungated function-value candidates and let the resolver gate bind them repo-globally, guarded so a file-local static function is never the target of a cross-module reference and a same-file definition wins over a same-named extern elsewhere.
… for the add phase The reference-confirm pass ran sequentially, one open+references+close round trip per ambiguous edge (~7 edges/s), and under the full per-repo deadline. On a medium repo it consumed the entire window before the hover / hierarchy add phase ran even once, so edges_added stayed 0 and the lsp-tier answer set was a strict subset of an already-sparse graph. Fan the reference sweep out across maxParallel, grouped by the referent file so each file is opened once and serves every target sharing it, and order the groups highest-yield-first so a deadline cut leaves the most confirmations landed. Reserve a fraction of the per-repo deadline for the post-confirm sweep so a slow confirm pass can no longer starve it. The definition-rebind fallback, which opens arbitrary call-site files, runs serially after the parallel sweep so document open/close never overlaps across goroutines.
On a repo with a broad .clang-tidy, clangd runs lint matchers during semantic enrichment and can crash mid-pass; Gortex reconnects and repeats the work, turning it into a crash -> reconnect -> reindex loop that pins clangd at high CPU for the entire pass (observed ~113-minute runs). Gortex consumes semantic graph signal, not clang-tidy diagnostics, so run clangd with --clang-tidy=false. --background-index is kept — the reference-confirm pass depends on it for cross-file results.
Enrichment opened the referent file of every ambiguous edge (and every interface / hover / rebind target) without checking that the file's extension is one the language server can compile. An ambiguous edge whose referent lives in an asset (.png/.svg) or an unrelated script (.sh) was opened on clangd, which then tried to build an AST with an inferred C++ compile command and churned on invalid ASTs for zero graph signal. Add a servesFile guard keyed on the ServerSpec's extension coverage and apply it at every enrichment document-open site: the implementation pass, the reference-confirm grouping, the hover/hierarchy sweep, and the definition rebind fallback. Providers without a spec (unit fakes) are unaffected.
reconnectWithBackoff caps connection retries within one reconnect cycle, but a server that connects cleanly and then exits again on the next request (a clangd crashing repeatedly in a lint matcher) would reconnect without bound — repeating work and pinning the process at high CPU for the whole pass. Cap the number of reconnect cycles per enrichment pass. Past the cap, abandon the provider's enrichment for the repo: flip the existing abort path so the pass returns with everything already flushed intact, and log the provider, repo, and reconnect count. A transient one-off exit still reconnects and continues.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four independent fixes that recover C
find_usagesresults clangd resolves but Gortex was silently missing (measured against clangd ground truth on redis), plus a cross-language enrichment-scheduling fix.1. Pointer-return function definitions and prototypes are now extracted
The function-definition query only matched a bare
function_declarator, so a pointer-return signature —robj *streamDup(),static inline list *f(),robj **g()— where tree-sitter nests thefunction_declaratorunder one ortwo
pointer_declarators, produced no node at all. Call sites inside such a function's body then had no enclosing function, so their call edges were dropped too.The definition is now matched broadly and the name derived by peeling the declarator chain; the C prototype and C++ free-function queries gain pointer-declarator alternations, and C++
extractFuncNamedescends pointer/reference/parenthesized wrappers so pointer-return methods are no longer dropped.2. Generated
.defcommand tables are indexed and their references recoveredA generated C fragment such as redis's
src/commands.def— thousands ofMAKE_CMD(...)rows#included into a translation unit — was never parsed (.defhad no extractor), so the only reference to a command function lived in an unindexed file andfind_usagesreported the function as unused and safe to remove..defis now parsed as C (an uncontested extension) and a clearly-C.incfragment is content-routed to the C extractor without disturbing the PHP/Pascal/asm claimants. A bare function identifier used in a table value position (a macro/call argument, an aggregate initializer element, a designated.fieldinitializer) is captured as a function-value reference attributed to the file node and bound by the resolver to the command function in its defining translation unit.3. Cross-file function-address value references bind
A function used by address in another translation unit — a function-pointer identity check (
c->cmd->proc != execCommand), a function-pointer assignment, a return of a function, or&fn— returned zero usages because same-file value capture drops any name the file does not declare, and C addresses functions across a flat extern namespace.These in-function value positions are now captured and bound repo-globally, guarded so a file-local
staticfunction is never the target of a cross-module reference and a same-file definition wins over a same-named extern elsewhere.4. LSP enrichment: parallel confirm pass + reserved add-phase budget
The reference-confirm pass ran sequentially — one open+references+close round trip per ambiguous edge (~7 edges/s) — under the full per-repo deadline, so on a medium repo it consumed the whole window before the hover / hierarchy add phase ran even once.
edges_addedstayed 0 and the lsp-tier answer set was a strict subset of an already-sparse graph. This reproduced across clangd, gopls, and rust-analyzer, so it is not a C quirk.The reference sweep now fans out across
maxParallel, grouped by referent file so each file is opened once and serves every target sharing it, ordered highest-yield-first so a deadline cut leaves the most confirmations landed. Afraction of the per-repo deadline is reserved for the post-confirm sweep so a slow confirm pass can no longer starve it. The definition-rebind fallback (which opens arbitrary call-site files) runs serially after the parallel sweep so document open/close never overlaps across goroutines.
5. LSP enrichment reliability
Making enrichment faster and more concurrent amplifies crash storms and wasted work, so this PR also lands three adjacent reliability fixes for the same code path:
--clang-tidy=false. Gortex consumes semantic graph signal, not lint diagnostics, and a broad repo.clang-tidycould crash clangd into a crash → reconnect → reindex loop that pinned it at high CPU for the whole pass (observed ~113-minute runs).--background-indexis kept — the reference-confirm pass depends on it for cross-file results. Closes clangd enrichment can trigger clang-tidy crashes and high CPU #220ServerSpeccoverage. AservesFileguard is applied at every document-open site (implementation pass, confirm grouping, hover / hierarchy sweep, definition rebind fallback), so an asset (.png/.svg) or unrelated script (.sh) is no longer opened on clangd only to churn on an invalid AST. Closes LSP enrichment sends non-C/C++ files to clangd #221reconnectWithBackoffalready caps connection retries within one cycle; this caps the number of cycles, so a server that connects cleanly but keeps exiting mid-request is abandoned for the repo after the cap — with everything already flushed left intact and the provider / repo / reconnect count logged — instead of reconnecting without bound. A transient one-off exit still reconnects. Closes Add crash-loop guard for LSP semantic enrichment #222Testing
.def/.incdetection, the parallel-confirm reserved-budget path, the provider-coverage guard (asset files never reach clangd), and the crash-loop guard (a repeatedly-crashing server is abandoned, not looped).go test ./...green — the only failures were two wall-clock perf tests that are load-flaky under full-suite contention and pass in isolation.-race; the concurrent enrichment path is validated under-raceand non-flaky.Notes
No changes to graph node/edge schemas. The end-to-end index benchmark on redis (and the Go regression benchmark) is the recommended pre-merge validation.