A Go type defined once but referenced via parameter/return/field types in N other
files produces 1+N nodes — the extras carry the referencing file's path
(extension and all) baked into the id (pkg_a_go_thing). This is the same
phantom-duplicate class #1402 handled for the other extractors; the Go copy of
ensure_named_node still uses the older sourced-stub fallback, so its cross-file
references don't get picked up by _rewire_unique_stub_nodes.
Surfaced while running graphify over two production Go codebases — sql.NullTime
referenced across files showed up as 15 separate nodes and PublicKey as 33.
Minimal repro
pkg/thing.go: package pkg; type Thing struct{}; func (t Thing) Run() int { return 1 }
pkg/a.go: package pkg; func UseA(obj Thing) Thing { return obj }
pkg/b.go: package pkg; func UseB(obj Thing) Thing { return obj }
from graphify.extract import extract
from pathlib import Path
r = extract([Path('pkg/thing.go'), Path('pkg/a.go'), Path('pkg/b.go')], cache_root=Path('.'))
print(sorted(n['id'] for n in r['nodes'] if n['label'] == 'Thing'))
# actual: ['pkg_a_go_thing', 'pkg_b_go_thing', 'pkg_thing']
# expected: ['pkg_thing']
Fix
Make the Go ensure_named_node cross-file fallback emit a sourceless stub like the
other extractors, so the references resolve to the single canonical definition. With
that the repro yields one pkg_thing, and across the two corpora the phantom
type-ref nodes drop from 116→7 and 158→7 — the residual being external types
(sql.NullTime, no local definition) and same-named types defined in two packages,
which the other extractors leave alone by design.
A Go type defined once but referenced via parameter/return/field types in N other
files produces 1+N nodes — the extras carry the referencing file's path
(extension and all) baked into the id (
pkg_a_go_thing). This is the samephantom-duplicate class #1402 handled for the other extractors; the Go copy of
ensure_named_nodestill uses the older sourced-stub fallback, so its cross-filereferences don't get picked up by
_rewire_unique_stub_nodes.Surfaced while running graphify over two production Go codebases —
sql.NullTimereferenced across files showed up as 15 separate nodes and
PublicKeyas 33.Minimal repro
Fix
Make the Go
ensure_named_nodecross-file fallback emit a sourceless stub like theother extractors, so the references resolve to the single canonical definition. With
that the repro yields one
pkg_thing, and across the two corpora the phantomtype-ref nodes drop from 116→7 and 158→7 — the residual being external types
(
sql.NullTime, no local definition) and same-named types defined in two packages,which the other extractors leave alone by design.