Skip to content

fix(extract): add origin_file to cross-file stubs in the six dedicated extractors#1515

Closed
TPAteeq wants to merge 1 commit into
Graphify-Labs:v8from
TPAteeq:fix-stub-origin-file-langs
Closed

fix(extract): add origin_file to cross-file stubs in the six dedicated extractors#1515
TPAteeq wants to merge 1 commit into
Graphify-Labs:v8from
TPAteeq:fix-stub-origin-file-langs

Conversation

@TPAteeq

@TPAteeq TPAteeq commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #1462. ensure_named_node's cross-file fallback emits a sourceless
stub (source_file="") for a type referenced from another file. #1462 added an
internal origin_file field so that when there's no single project definition to
rewire onto, two files referencing the same bare type name get distinct stubs
(disambiguated by origin_file) instead of conflating into one shared bare-id node —
while keeping source_file="" so the #1402 rewire still collapses them onto a real
definition when one does exist.

That landed in only the generic extractor. The six dedicated extractors
(extract_go, extract_rust, extract_julia, extract_fortran,
extract_powershell, extract_objc) still lacked origin_file, so they kept
conflating.

Change

Add "origin_file": str_path to the sourceless stub in those six extractors (one line
each), matching the generic one. Only the ensure_named_node imported-type-stub path
is touched — the same path #1462 changed.

Result

Two files referencing the same bare type with no local definition:

extractor before after
Python (generic) 2 distinct stubs 2 (unchanged)
Go / Rust / … (dedicated) 1 conflated widget 2 distinct (a_use_a_go_widget, b_use_b_go_widget)

The #1402 rewire is unaffected (source_file stays empty): when a single real
definition exists, the stubs still collapse onto it.

Tests

Out of scope

The generic extractor has other sourceless-stub sites (inheritance bases) that also
lack origin_file. #1462 didn't touch those either, and a shared base class is more
plausibly the same type (collapsing is more often correct there), so they're kept aside for a different PR.

…d extractors

Graphify-Labs#1462 added an internal origin_file field to ensure_named_node's sourceless
cross-file stub so two files referencing the same bare type name (with no single
project definition to rewire onto) get distinct stubs instead of conflating into
one bare-id node — while keeping source_file="" so the Graphify-Labs#1402 rewire still
collapses them onto a real definition when one exists. That landed in only the
generic extractor; the six dedicated ones (Go, Rust, Julia, Fortran, PowerShell,
ObjC) still conflated.

Add origin_file to the sourceless stub in those six, matching the generic
extractor. Without it, e.g. two Go files referencing ext.Widget collapsed onto a
single shared `widget` node (a false cross-package link); now they stay distinct
(a_use_a_go_widget / b_use_b_go_widget), like the Python case Graphify-Labs#1462 fixed. The
Graphify-Labs#1402 rewire is untouched (source_file stays empty). Adds a Go regression test
mirroring test_imported_type_stubs_do_not_collide_across_source_files; it fails
pre-fix (one conflated node) and passes post-fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
safishamsi pushed a commit that referenced this pull request Jun 28, 2026
…d extractors (#1515)

The #1462 same-label cross-file stub disambiguation (the origin_file key) only
existed in the generic extractor, so the six dedicated extractors — Julia,
Fortran, Go, Rust, PowerShell, ObjC — still collapsed same-named imported-type
stubs from different files into one conflated bare-id node (a false cross-package
link). Each now sets origin_file on its sourceless stub, identical to the generic
extractor; the generic _node_disambiguation_source_key consumes it, so two files
importing the same type stay distinct while source_file stays empty (the #1402
rewire onto a real definition is unchanged).

Ported from PR #1515 by @TPAteeq. Must ship with #1516: this widens origin_file to
six more languages, and #1516 is what strips it from graph.json. Verified: a 2-file
Go corpus now yields 2 distinct Widget stubs AND graph.json carries no origin_file.
Resolved a test-insertion conflict with #1516 by keeping both tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
safishamsi added a commit that referenced this pull request Jun 28, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@safishamsi

Copy link
Copy Markdown
Collaborator

Thanks @TPAteeq — this completes the #1462 disambiguation across the six dedicated extractors (Julia/Fortran/Go/Rust/PowerShell/ObjC), which previously only the generic extractor had. Landed on v8 in d177f04 with your authorship, alongside #1516 (which strips the field from output — they had to ship together since this widens origin_file to more languages). Verified: a 2-file Go corpus now yields 2 distinct Widget stubs AND graph.json carries no origin_file; the #1402 rewire onto a real definition is unchanged. Closing as merged-by-port — ships in 0.9.1.

@safishamsi safishamsi closed this Jun 28, 2026
safishamsi added a commit that referenced this pull request Jun 28, 2026
Patch over 0.9.0: completes the node-ID work (fully closes #1504 via injective
salt #1522), stops origin_file leaking into graph.json (#1516), extends cross-file
stub disambiguation to the six dedicated extractors (#1515), Java type-param skip
(#1518) + record component refs (#1519), prunes a deleted import's edge on update
(#1521), and retries rate-limited (429) requests instead of dropping chunks (#1523).
All non-breaking.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants