Skip to content

feat(extract): capture indirect dispatch as indirect_call edges#1565

Closed
sheik-hiiobd wants to merge 1 commit into
Graphify-Labs:v8from
sheik-hiiobd:indirect-call-blast-radius
Closed

feat(extract): capture indirect dispatch as indirect_call edges#1565
sheik-hiiobd wants to merge 1 commit into
Graphify-Labs:v8from
sheik-hiiobd:indirect-call-blast-radius

Conversation

@sheik-hiiobd

Copy link
Copy Markdown
Contributor

Problem

affected (blast radius) is sound for direct calls but blind to indirect dispatch. A function passed by name as a call argument — executor.submit(fn), Thread(target=fn), map(fn, xs) — is a real dependency, but the callee-only scan in walk_calls never records it, so those callers are dropped from affected.

Minimal repro (0.9.3):

def handler(x): return x * 2
def direct():         return handler(1)         # caught
def via_submit(pool): pool.submit(handler, 1)   # MISSED

graphify affected handler lists direct() but omits via_submit(). An under-counting blast radius reads complete while dropping exactly the callers where regressions hide.

Change

  • In walk_calls (Python), scan a call's arguments for resolvable function-name identifiers (positional + keyword_argument values) and emit a distinct indirect_call relation. Keeping it separate leaves strict calls queries precise while affected/blast-radius picks up the edge.
  • Add indirect_call to DEFAULT_AFFECTED_RELATIONS.
  • Dedup-safe (won't duplicate an existing direct calls edge); confidence="INFERRED", context="argument".

After: affected handler -> direct, via_submit, via_thread.

Scope / limits (intentional)

  • Python only for now; the same pattern extends to the other tree-sitter grammars.
  • Covers the call-argument case (executors, threads, callbacks, map/filter). Dispatch via dict literals ({k: fn}), getattr-by-string, and decorators lives in other AST nodes and is left for a follow-up.
  • Same-file resolution via the in-file label_to_nid; cross-file indirect refs can later reuse the existing raw_calls path.

Tests

tests/test_indirect_dispatch.py — the indirect callers get indirect_call edges, the direct caller stays a calls edge (no relation regression), and affected(handler) now includes the dispatchers. Full suite green; ruff clean.

A function passed by name as a call argument -- executor.submit(fn),
Thread(target=fn), map(fn, xs) -- is a real dependency, but the
callee-only call scan never recorded it, so `affected` (blast radius)
silently dropped those callers.

Emit a distinct `indirect_call` relation for resolvable function-name
arguments in Python calls, and add it to DEFAULT_AFFECTED_RELATIONS. A
separate relation leaves strict `calls` queries unchanged while
affected/blast-radius picks up the edge.

Python only for now; dispatch via dict literals, getattr and decorators
lives in other AST nodes and is left for a follow-up.
safishamsi pushed a commit that referenced this pull request Jun 30, 2026
`graphify affected` (blast radius) was blind to a function passed BY NAME as
a call argument — `executor.submit(fn)`, `Thread(target=fn)`, `map(fn, xs)`,
callbacks — so those callers were silently dropped from the affected set (an
under-counting blast radius reads complete while missing exactly where
regressions hide). Capture them as a distinct `indirect_call` relation
(INFERRED, context "argument"), kept separate from `calls` so strict
call-graph queries stay precise, and add it to DEFAULT_AFFECTED_RELATIONS.

Hardened over the original PR against the name-collision false edge (the
Doxygen #3748 trap): emit only when the argument identifier resolves to a
callable definition AND is not shadowed by a parameter or local binding in
the enclosing function. So `def via(pool, handler): pool.submit(handler)`
(handler is the param, not the module function) and `process(config)` where
`config` is local data emit no edge, while a genuine module function passed
by name still resolves. Dedup-safe against existing direct `calls` edges.
Python only; dict-literal / getattr-by-string / decorator dispatch deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@safishamsi

Copy link
Copy Markdown
Collaborator

Merged into v8 (with your authorship) — thanks, this is a genuinely useful gap to close: affected/blast-radius silently dropping callers that pass a function by name (submit(fn), Thread(target=fn), map(fn, xs)). The design is right — a distinct indirect_call INFERRED relation kept separate from calls, added to the affected set.

One hardening I added before merge: the guard was purely name-based (it emitted whenever an argument identifier matched any function label in the file), which produces a false edge for the idiomatic local/param-shadow case — def via(pool, handler): pool.submit(handler) (here handler is the parameter, not the module function), and for a data variable sharing a function's name (process(config)). I added: (1) emit only when the name resolves to a callable definition, and (2) reject when the name is shadowed by a parameter or local binding in the enclosing function. Added negative tests for param-shadow, local-shadow, and data-var alongside your positives. Verified: genuine submit(handler) still emits; the shadow cases emit nothing; full suite green. Ships next release.

@safishamsi safishamsi closed this Jun 30, 2026
safishamsi added a commit that referenced this pull request Jun 30, 2026
#1565 captured a function passed by name as a call argument
(executor.submit(fn), Thread(target=fn), map(fn, xs)) only when the
callback was defined in the SAME file — it resolved through the in-file
label map. But the dominant real-world shape is cross-module: the callback
is imported (`from .handlers import on_event; pool.submit(on_event)`), so
the same-file map can't see it and the edge was dropped — exactly the
caller a blast-radius query must not miss.

When the argument identifier isn't defined in-file (and isn't shadowed by
a param/local — that guard already ran), emit an `indirect` raw_call and
let the existing cross-file resolution pass handle it, branching to a
distinct indirect_call/INFERRED edge instead of calls. It rides the same
single-definition god-node guard and import-evidence disambiguation as
direct calls (parity: when a direct call resolves, so does the indirect
one; when the name is ambiguous, both bail). Two added safeguards: the
target must be a real callable def (per-file callable_nids unioned across
the corpus) so an imported data constant can never become a dispatch
target, and an existing direct `calls` edge for the pair pre-empts it.

Smoke-verified: imported single-def callback resolves; ambiguous name
bails (same as direct); imported data constant rejected; direct+indirect
to the same target keeps only the direct edge; cross-file keyword
target=cb resolves. Full suite 2710 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
safishamsi added a commit that referenced this pull request Jun 30, 2026
Slice 1 of #1566. A function referenced as a VALUE in a dict/list/set/
tuple literal — the registry/route-table idiom (`ROUTES = {"create":
create_user}`, `HOOKS = [on_start, on_stop]`) — is an indirect dependency
that blast-radius must see, but the call-argument capture (#1565) didn't
cover it. Emit an indirect_call edge for each callable value:

- module-level tables attribute to the file node; function-scoped tables
  attribute to the enclosing function. Same-file and cross-file (an
  imported handler in a table routes through the cross-file resolver).
- dict KEYS are excluded (only values are references); non-callable values
  (a number, a string) never resolve; a name shadowed by a param/local or
  rebound at module scope is the local value, not the function.

Refactors the resolve-and-emit logic shared by the argument and table
paths into one guarded helper (_emit_python_indirect_ref) and threads the
edge `context` ("argument" | "collection") through the cross-file pass.
Adds _python_module_bound_names for the module-scope shadow set.

7 new tests (module dict/list, function-scoped, dict-keys-excluded,
non-callable-value, module-reassign shadow, cross-file table). Full suite
2717 passed. Python only; assignment/return refs + other languages remain
on #1566.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
safishamsi pushed a commit that referenced this pull request Jun 30, 2026
…lice 2)

A function bound to a name (cb = handler) or returned from a factory
(def make(): return handler) is a real reference, but indirect_call only
covered call arguments and dispatch tables, so `affected` still dropped these
callers.

Emit indirect_call (context "assignment"/"return", INFERRED) for the value-side
identifiers of a Python assignment RHS and a return, at function scope (owner =
enclosing function) and module scope (owner = file node). Reuses the shared
_emit_indirect_ref guard. Scans the VALUE side only -- the assignment target is
a new local binding, not a reference -- so the existing param/local shadow guard
still rejects the false edges #1565 fixed.

Negatives covered: param-shadow, local-shadow, non-callable emit nothing.
Full suite green; ruff clean.
safishamsi pushed a commit that referenced this pull request Jul 1, 2026
…slice 3)

Reflective dispatch by string literal — getattr(obj, "handler") — resolves the
attribute name to a callable def and emits it under indirect_call (context
"getattr", INFERRED) at both function and module scope, so `graphify affected
handler` now covers getattr call sites.

The name is a STRING, not an identifier: it names an attribute and is never
shadowed by a param/local, so it resolves without the identifier shadow guard —
the inverse of the #1565/#1566 identifier paths. A dynamic name (a variable,
f-string, concatenation, or any expression) is not statically resolvable and emits
nothing; obj.getattr(...) (a method, not the builtin) and the 1-arg form are ignored.

Refactors the shared resolve-and-emit core out of _emit_indirect_ref into
_emit_indirect_by_name so the getattr path reuses it (callable-target-only,
cross-file deferral, dedup) without duplicating the guard; the identifier wrapper
is behavior-preserving. Full suite green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants