Skip to content

Recover missing C function usages + faster LSP enrichment#223

Merged
zzet merged 7 commits into
mainfrom
feat/c-usages-parity
Jul 2, 2026
Merged

Recover missing C function usages + faster LSP enrichment#223
zzet merged 7 commits into
mainfrom
feat/c-usages-parity

Conversation

@zzet

@zzet zzet commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

Four independent fixes that recover C find_usages results clangd resolves but Gortex was silently missing (measured against clangd ground truth on redis), plus a cross-language enrichment-scheduling fix.

1. Pointer-return function definitions and prototypes are now extracted

The function-definition query only matched a bare function_declarator, so a pointer-return signature — robj *streamDup(), static inline list *f(), robj **g() — where tree-sitter nests the function_declarator under one or
two pointer_declarators, produced no node at all. Call sites inside such a function's body then had no enclosing function, so their call edges were dropped too.

The definition is now matched broadly and the name derived by peeling the declarator chain; the C prototype and C++ free-function queries gain pointer-declarator alternations, and C++ extractFuncName descends pointer/reference/parenthesized wrappers so pointer-return methods are no longer dropped.

2. Generated .def command tables are indexed and their references recovered

A generated C fragment such as redis's src/commands.def — thousands of MAKE_CMD(...) rows #included into a translation unit — was never parsed (.def had no extractor), so the only reference to a command function lived in an unindexed file and find_usages reported the function as unused and safe to remove.

.def is now parsed as C (an uncontested extension) and a clearly-C .inc fragment is content-routed to the C extractor without disturbing the PHP/Pascal/asm claimants. A bare function identifier used in a table value position (a macro/call argument, an aggregate initializer element, a designated .field initializer) is captured as a function-value reference attributed to the file node and bound by the resolver to the command function in its defining translation unit.

3. Cross-file function-address value references bind

A function used by address in another translation unit — a function-pointer identity check (c->cmd->proc != execCommand), a function-pointer assignment, a return of a function, or &fn — returned zero usages because same-file value capture drops any name the file does not declare, and C addresses functions across a flat extern namespace.

These in-function value positions are now captured and bound repo-globally, guarded so a file-local static function is never the target of a cross-module reference and a same-file definition wins over a same-named extern elsewhere.

4. LSP enrichment: parallel confirm pass + reserved add-phase budget

The reference-confirm pass ran sequentially — one open+references+close round trip per ambiguous edge (~7 edges/s) — under the full per-repo deadline, so on a medium repo it consumed the whole window before the hover / hierarchy add phase ran even once. edges_added stayed 0 and the lsp-tier answer set was a strict subset of an already-sparse graph. This reproduced across clangd, gopls, and rust-analyzer, so it is not a C quirk.

The reference sweep now fans out across maxParallel, grouped by referent file so each file is opened once and serves every target sharing it, ordered highest-yield-first so a deadline cut leaves the most confirmations landed. A
fraction of the per-repo deadline is reserved for the post-confirm sweep so a slow confirm pass can no longer starve it. The definition-rebind fallback (which opens arbitrary call-site files) runs serially after the parallel sweep so document open/close never overlaps across goroutines.

5. LSP enrichment reliability

Making enrichment faster and more concurrent amplifies crash storms and wasted work, so this PR also lands three adjacent reliability fixes for the same code path:

Testing

  • New unit / end-to-end tests for each change: pointer-return extraction, command-table + designated-initializer binding, cross-file / static-guard binding, .def / .inc detection, the parallel-confirm reserved-budget path, the provider-coverage guard (asset files never reach clangd), and the crash-loop guard (a repeatedly-crashing server is abandoned, not looped).
  • go test ./... green — the only failures were two wall-clock perf tests that are load-flaky under full-suite contention and pass in isolation.
  • Touched packages pass under -race; the concurrent enrichment path is validated under -race and non-flaky.
  • golangci-lint clean; wire-contract fingerprint unchanged (no graph node/edge schema changes).

Notes

No changes to graph node/edge schemas. The end-to-end index benchmark on redis (and the Go regression benchmark) is the recommended pre-merge validation.

zzet added 7 commits July 2, 2026 23:27
The function-definition query only matched a bare function_declarator, so a
pointer-return signature (robj *streamDup(), static inline list *f(),
robj **g()) — where tree-sitter nests the function_declarator under one or two
pointer_declarators — produced no node at all. Call sites inside such a
function's body then had no enclosing function and their call edges were
dropped too.

Match function_definition broadly and derive the name by peeling the
declarator chain (reusing cDeclName), and add pointer-declarator alternations
to the C prototype and C++ free-function queries. Fix C++ extractFuncName to
descend pointer/reference/parenthesized declarator wrappers so pointer-return
methods are no longer dropped.
…erences

A generated C fragment such as redis's src/commands.def — 12k lines of
MAKE_CMD(...) rows #include'd into a translation unit — was never parsed
(.def had no extractor), so the only reference to a command function lived in
an unindexed file and find_usages reported the function as unused.

Parse .def as C (uncontested extension) and content-route a clearly-C .inc
fragment to the C extractor without disturbing the PHP/Pascal/asm claimants.
Capture a bare function identifier used in a table value position — a macro or
call argument, an aggregate initializer element, a designated .field
initializer — as an ungated function-value candidate attributed to the file
node, so the resolver gate binds it to the uniquely-named command function in
its defining translation unit. Flood is held down by value-position gating plus
dropping every same-file declaration, parameter, and local.
A function used by address in another translation unit — a function-pointer
identity check (c->cmd->proc != execCommand), a function-pointer assignment
(*slot = execCommand), a return of a function, or &fn — returned zero usages
because same-file value capture drops any name the file does not declare, and C
addresses functions across a flat extern namespace.

Capture these in-function value positions as ungated function-value candidates
and let the resolver gate bind them repo-globally, guarded so a file-local
static function is never the target of a cross-module reference and a same-file
definition wins over a same-named extern elsewhere.
… for the add phase

The reference-confirm pass ran sequentially, one open+references+close round
trip per ambiguous edge (~7 edges/s), and under the full per-repo deadline. On
a medium repo it consumed the entire window before the hover / hierarchy add
phase ran even once, so edges_added stayed 0 and the lsp-tier answer set was a
strict subset of an already-sparse graph.

Fan the reference sweep out across maxParallel, grouped by the referent file so
each file is opened once and serves every target sharing it, and order the
groups highest-yield-first so a deadline cut leaves the most confirmations
landed. Reserve a fraction of the per-repo deadline for the post-confirm sweep
so a slow confirm pass can no longer starve it. The definition-rebind fallback,
which opens arbitrary call-site files, runs serially after the parallel sweep
so document open/close never overlaps across goroutines.
On a repo with a broad .clang-tidy, clangd runs lint matchers during semantic
enrichment and can crash mid-pass; Gortex reconnects and repeats the work,
turning it into a crash -> reconnect -> reindex loop that pins clangd at high
CPU for the entire pass (observed ~113-minute runs). Gortex consumes semantic
graph signal, not clang-tidy diagnostics, so run clangd with --clang-tidy=false.
--background-index is kept — the reference-confirm pass depends on it for
cross-file results.
Enrichment opened the referent file of every ambiguous edge (and every
interface / hover / rebind target) without checking that the file's extension
is one the language server can compile. An ambiguous edge whose referent lives
in an asset (.png/.svg) or an unrelated script (.sh) was opened on clangd,
which then tried to build an AST with an inferred C++ compile command and
churned on invalid ASTs for zero graph signal.

Add a servesFile guard keyed on the ServerSpec's extension coverage and apply
it at every enrichment document-open site: the implementation pass, the
reference-confirm grouping, the hover/hierarchy sweep, and the definition
rebind fallback. Providers without a spec (unit fakes) are unaffected.
reconnectWithBackoff caps connection retries within one reconnect cycle, but a
server that connects cleanly and then exits again on the next request (a clangd
crashing repeatedly in a lint matcher) would reconnect without bound — repeating
work and pinning the process at high CPU for the whole pass.

Cap the number of reconnect cycles per enrichment pass. Past the cap, abandon
the provider's enrichment for the repo: flip the existing abort path so the pass
returns with everything already flushed intact, and log the provider, repo, and
reconnect count. A transient one-off exit still reconnects and continues.
@zzet zzet merged commit 5d915d7 into main Jul 2, 2026
9 checks passed
@zzet zzet deleted the feat/c-usages-parity branch July 2, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant