feat(extract): capture indirect dispatch as indirect_call edges#1565
feat(extract): capture indirect dispatch as indirect_call edges#1565sheik-hiiobd wants to merge 1 commit into
Conversation
A function passed by name as a call argument -- executor.submit(fn), Thread(target=fn), map(fn, xs) -- is a real dependency, but the callee-only call scan never recorded it, so `affected` (blast radius) silently dropped those callers. Emit a distinct `indirect_call` relation for resolvable function-name arguments in Python calls, and add it to DEFAULT_AFFECTED_RELATIONS. A separate relation leaves strict `calls` queries unchanged while affected/blast-radius picks up the edge. Python only for now; dispatch via dict literals, getattr and decorators lives in other AST nodes and is left for a follow-up.
`graphify affected` (blast radius) was blind to a function passed BY NAME as a call argument — `executor.submit(fn)`, `Thread(target=fn)`, `map(fn, xs)`, callbacks — so those callers were silently dropped from the affected set (an under-counting blast radius reads complete while missing exactly where regressions hide). Capture them as a distinct `indirect_call` relation (INFERRED, context "argument"), kept separate from `calls` so strict call-graph queries stay precise, and add it to DEFAULT_AFFECTED_RELATIONS. Hardened over the original PR against the name-collision false edge (the Doxygen #3748 trap): emit only when the argument identifier resolves to a callable definition AND is not shadowed by a parameter or local binding in the enclosing function. So `def via(pool, handler): pool.submit(handler)` (handler is the param, not the module function) and `process(config)` where `config` is local data emit no edge, while a genuine module function passed by name still resolves. Dedup-safe against existing direct `calls` edges. Python only; dict-literal / getattr-by-string / decorator dispatch deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged into One hardening I added before merge: the guard was purely name-based (it emitted whenever an argument identifier matched any function label in the file), which produces a false edge for the idiomatic local/param-shadow case — |
#1565 captured a function passed by name as a call argument (executor.submit(fn), Thread(target=fn), map(fn, xs)) only when the callback was defined in the SAME file — it resolved through the in-file label map. But the dominant real-world shape is cross-module: the callback is imported (`from .handlers import on_event; pool.submit(on_event)`), so the same-file map can't see it and the edge was dropped — exactly the caller a blast-radius query must not miss. When the argument identifier isn't defined in-file (and isn't shadowed by a param/local — that guard already ran), emit an `indirect` raw_call and let the existing cross-file resolution pass handle it, branching to a distinct indirect_call/INFERRED edge instead of calls. It rides the same single-definition god-node guard and import-evidence disambiguation as direct calls (parity: when a direct call resolves, so does the indirect one; when the name is ambiguous, both bail). Two added safeguards: the target must be a real callable def (per-file callable_nids unioned across the corpus) so an imported data constant can never become a dispatch target, and an existing direct `calls` edge for the pair pre-empts it. Smoke-verified: imported single-def callback resolves; ambiguous name bails (same as direct); imported data constant rejected; direct+indirect to the same target keeps only the direct edge; cross-file keyword target=cb resolves. Full suite 2710 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Slice 1 of #1566. A function referenced as a VALUE in a dict/list/set/ tuple literal — the registry/route-table idiom (`ROUTES = {"create": create_user}`, `HOOKS = [on_start, on_stop]`) — is an indirect dependency that blast-radius must see, but the call-argument capture (#1565) didn't cover it. Emit an indirect_call edge for each callable value: - module-level tables attribute to the file node; function-scoped tables attribute to the enclosing function. Same-file and cross-file (an imported handler in a table routes through the cross-file resolver). - dict KEYS are excluded (only values are references); non-callable values (a number, a string) never resolve; a name shadowed by a param/local or rebound at module scope is the local value, not the function. Refactors the resolve-and-emit logic shared by the argument and table paths into one guarded helper (_emit_python_indirect_ref) and threads the edge `context` ("argument" | "collection") through the cross-file pass. Adds _python_module_bound_names for the module-scope shadow set. 7 new tests (module dict/list, function-scoped, dict-keys-excluded, non-callable-value, module-reassign shadow, cross-file table). Full suite 2717 passed. Python only; assignment/return refs + other languages remain on #1566. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lice 2) A function bound to a name (cb = handler) or returned from a factory (def make(): return handler) is a real reference, but indirect_call only covered call arguments and dispatch tables, so `affected` still dropped these callers. Emit indirect_call (context "assignment"/"return", INFERRED) for the value-side identifiers of a Python assignment RHS and a return, at function scope (owner = enclosing function) and module scope (owner = file node). Reuses the shared _emit_indirect_ref guard. Scans the VALUE side only -- the assignment target is a new local binding, not a reference -- so the existing param/local shadow guard still rejects the false edges #1565 fixed. Negatives covered: param-shadow, local-shadow, non-callable emit nothing. Full suite green; ruff clean.
…slice 3) Reflective dispatch by string literal — getattr(obj, "handler") — resolves the attribute name to a callable def and emits it under indirect_call (context "getattr", INFERRED) at both function and module scope, so `graphify affected handler` now covers getattr call sites. The name is a STRING, not an identifier: it names an attribute and is never shadowed by a param/local, so it resolves without the identifier shadow guard — the inverse of the #1565/#1566 identifier paths. A dynamic name (a variable, f-string, concatenation, or any expression) is not statically resolvable and emits nothing; obj.getattr(...) (a method, not the builtin) and the 1-arg form are ignored. Refactors the shared resolve-and-emit core out of _emit_indirect_ref into _emit_indirect_by_name so the getattr path reuses it (callable-target-only, cross-file deferral, dedup) without duplicating the guard; the identifier wrapper is behavior-preserving. Full suite green.
Problem
affected(blast radius) is sound for direct calls but blind to indirect dispatch. A function passed by name as a call argument —executor.submit(fn),Thread(target=fn),map(fn, xs)— is a real dependency, but the callee-only scan inwalk_callsnever records it, so those callers are dropped fromaffected.Minimal repro (0.9.3):
graphify affected handlerlistsdirect()but omitsvia_submit(). An under-counting blast radius reads complete while dropping exactly the callers where regressions hide.Change
walk_calls(Python), scan a call'sargumentsfor resolvable function-name identifiers (positional +keyword_argumentvalues) and emit a distinctindirect_callrelation. Keeping it separate leaves strictcallsqueries precise whileaffected/blast-radius picks up the edge.indirect_calltoDEFAULT_AFFECTED_RELATIONS.callsedge);confidence="INFERRED",context="argument".After:
affected handler->direct,via_submit,via_thread.Scope / limits (intentional)
map/filter). Dispatch via dict literals ({k: fn}),getattr-by-string, and decorators lives in other AST nodes and is left for a follow-up.label_to_nid; cross-file indirect refs can later reuse the existingraw_callspath.Tests
tests/test_indirect_dispatch.py— the indirect callers getindirect_calledges, the direct caller stays acallsedge (no relation regression), andaffected(handler)now includes the dispatchers. Full suite green;ruffclean.