fix(extract): add origin_file to cross-file stubs in the six dedicated extractors#1515
Closed
TPAteeq wants to merge 1 commit into
Closed
fix(extract): add origin_file to cross-file stubs in the six dedicated extractors#1515TPAteeq wants to merge 1 commit into
TPAteeq wants to merge 1 commit into
Conversation
…d extractors Graphify-Labs#1462 added an internal origin_file field to ensure_named_node's sourceless cross-file stub so two files referencing the same bare type name (with no single project definition to rewire onto) get distinct stubs instead of conflating into one bare-id node — while keeping source_file="" so the Graphify-Labs#1402 rewire still collapses them onto a real definition when one exists. That landed in only the generic extractor; the six dedicated ones (Go, Rust, Julia, Fortran, PowerShell, ObjC) still conflated. Add origin_file to the sourceless stub in those six, matching the generic extractor. Without it, e.g. two Go files referencing ext.Widget collapsed onto a single shared `widget` node (a false cross-package link); now they stay distinct (a_use_a_go_widget / b_use_b_go_widget), like the Python case Graphify-Labs#1462 fixed. The Graphify-Labs#1402 rewire is untouched (source_file stays empty). Adds a Go regression test mirroring test_imported_type_stubs_do_not_collide_across_source_files; it fails pre-fix (one conflated node) and passes post-fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
safishamsi
pushed a commit
that referenced
this pull request
Jun 28, 2026
…d extractors (#1515) The #1462 same-label cross-file stub disambiguation (the origin_file key) only existed in the generic extractor, so the six dedicated extractors — Julia, Fortran, Go, Rust, PowerShell, ObjC — still collapsed same-named imported-type stubs from different files into one conflated bare-id node (a false cross-package link). Each now sets origin_file on its sourceless stub, identical to the generic extractor; the generic _node_disambiguation_source_key consumes it, so two files importing the same type stay distinct while source_file stays empty (the #1402 rewire onto a real definition is unchanged). Ported from PR #1515 by @TPAteeq. Must ship with #1516: this widens origin_file to six more languages, and #1516 is what strips it from graph.json. Verified: a 2-file Go corpus now yields 2 distinct Widget stubs AND graph.json carries no origin_file. Resolved a test-insertion conflict with #1516 by keeping both tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
safishamsi
added a commit
that referenced
this pull request
Jun 28, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
|
Thanks @TPAteeq — this completes the #1462 disambiguation across the six dedicated extractors (Julia/Fortran/Go/Rust/PowerShell/ObjC), which previously only the generic extractor had. Landed on |
safishamsi
added a commit
that referenced
this pull request
Jun 28, 2026
Patch over 0.9.0: completes the node-ID work (fully closes #1504 via injective salt #1522), stops origin_file leaking into graph.json (#1516), extends cross-file stub disambiguation to the six dedicated extractors (#1515), Java type-param skip (#1518) + record component refs (#1519), prunes a deleted import's edge on update (#1521), and retries rate-limited (429) requests instead of dropping chunks (#1523). All non-breaking. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #1462.
ensure_named_node's cross-file fallback emits a sourcelessstub (
source_file="") for a type referenced from another file. #1462 added aninternal
origin_filefield so that when there's no single project definition torewire onto, two files referencing the same bare type name get distinct stubs
(disambiguated by
origin_file) instead of conflating into one shared bare-id node —while keeping
source_file=""so the #1402 rewire still collapses them onto a realdefinition when one does exist.
That landed in only the generic extractor. The six dedicated extractors
(
extract_go,extract_rust,extract_julia,extract_fortran,extract_powershell,extract_objc) still lackedorigin_file, so they keptconflating.
Change
Add
"origin_file": str_pathto the sourceless stub in those six extractors (one lineeach), matching the generic one. Only the
ensure_named_nodeimported-type-stub pathis touched — the same path #1462 changed.
Result
Two files referencing the same bare type with no local definition:
widgeta_use_a_go_widget,b_use_b_go_widget)The #1402 rewire is unaffected (
source_filestays empty): when a single realdefinition exists, the stubs still collapse onto it.
Tests
test_go_imported_type_stubs_do_not_collide_across_source_files(mirrors thePython Bug: AST extractor produces duplicate node IDs for same-name imported symbols (collision, not fragmentation) #1462 test). Fails pre-fix (one conflated node), passes post-fix.
tests/test_extract.py,tests/test_languages.py,tests/test_multilang.py: 420 pass.Out of scope
The generic extractor has other sourceless-stub sites (inheritance bases) that also
lack
origin_file. #1462 didn't touch those either, and a shared base class is moreplausibly the same type (collapsing is more often correct there), so they're kept aside for a different PR.