Skip to content

Go AST extractor emits phantom duplicate nodes for cross-file type references #1500

Description

@TPAteeq

A Go type defined once but referenced via parameter/return/field types in N other
files produces 1+N nodes — the extras carry the referencing file's path
(extension and all) baked into the id (pkg_a_go_thing). This is the same
phantom-duplicate class #1402 handled for the other extractors; the Go copy of
ensure_named_node still uses the older sourced-stub fallback, so its cross-file
references don't get picked up by _rewire_unique_stub_nodes.

Surfaced while running graphify over two production Go codebases — sql.NullTime
referenced across files showed up as 15 separate nodes and PublicKey as 33.

Minimal repro

pkg/thing.go:  package pkg; type Thing struct{}; func (t Thing) Run() int { return 1 }
pkg/a.go:      package pkg; func UseA(obj Thing) Thing { return obj }
pkg/b.go:      package pkg; func UseB(obj Thing) Thing { return obj }
from graphify.extract import extract
from pathlib import Path
r = extract([Path('pkg/thing.go'), Path('pkg/a.go'), Path('pkg/b.go')], cache_root=Path('.'))
print(sorted(n['id'] for n in r['nodes'] if n['label'] == 'Thing'))
# actual:   ['pkg_a_go_thing', 'pkg_b_go_thing', 'pkg_thing']
# expected: ['pkg_thing']

Fix

Make the Go ensure_named_node cross-file fallback emit a sourceless stub like the
other extractors, so the references resolve to the single canonical definition. With
that the repro yields one pkg_thing, and across the two corpora the phantom
type-ref nodes drop from 116→7 and 158→7 — the residual being external types
(sql.NullTime, no local definition) and same-named types defined in two packages,
which the other extractors leave alone by design.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions