Contain symlinked extraction inputs#1613
Closed
Tok6Flow0 wants to merge 1 commit into
Closed
Conversation
safishamsi
added a commit
that referenced
this pull request
Jul 2, 2026
Collaborator
|
Merged into |
nokternol
added a commit
to nokternol/graphify
that referenced
this pull request
Jul 4, 2026
…ify-Labs#1613) `explain` resolved to `_find_node`'s first match unconditionally, discarding every other candidate with no signal — the exact failure mode Graphify-Labs#1445 fixed for `query`'s BFS seeding, present all along in `explain`'s own resolution step but never addressed. Confirmed live: `explain "genres"` silently returned an unrelated, degree-1 Storybook literal while `GenresResponse` (the actually relevant node per this repo's own Trial 1 ground truth) sat one precedence tier down, discarded — not a same-tier tie but a lone weak exact match masking a real prefix match, since `_find_node`'s tiers apply strict precedence (exact beats prefix beats substring) with nothing to catch a weak winner. `_find_node`'s internal tiers are now exposed via `_find_node_tiers` (source_exact, exact, prefix, substring, plus whether the existing file-level-node heuristic already resolved a multi-entry source_exact tier). `explain` uses this to catch two ambiguity shapes instead of picking `matches[0]` blindly: - same-tier ties: more than one candidate in the top precedence tier. - precedence collapse: exactly one candidate in the top tier, but it is weakly connected (degree <= 1) and a lower tier has candidates that are fully hidden by tier precedence alone. Either case prints a numbered candidate list (label, source, degree) instead of guessing, and exits cleanly; `--force` reproduces the old behavior for scripts that want it. A degree-dominance escape hatch (mirrors `path`'s existing top-vs-runner-up gap check, applied to degree) keeps same-tier ties from over-firing: when one candidate's degree dominates the runner-up, it resolves directly with no prompt, found necessary during live verification against a real repo where a DI container type name legitimately appears as 3 distinct exact-tier nodes (the real definition plus 2 unrelated per-file type-annotation references) — a real, dominant winner, not a genuine tie. Full suite (2764 tests, same 1 pre-existing unrelated failure as prior commits) and ruff pass. Verified live against a real repo's graph.json: `explain "genres"` now surfaces both candidates; `explain "Cradle"` and `explain "identityResolutionJob"` (previously-clean cases) are unaffected; `explain "identity linking"` (a genuine no-match) is unaffected.
nokternol
added a commit
to nokternol/graphify
that referenced
this pull request
Jul 4, 2026
…raphify-Labs#1614) Port of Graphify-Labs#1613's `explain` fix into `path`'s endpoint resolution. `path` already detected a close top-vs-runner-up score gap on either endpoint, but only printed a stderr warning and then silently proceeded with the top-scored candidate regardless — the same silently-wrong-with-no-actionable-signal failure Graphify-Labs#1445 and Graphify-Labs#1613 fixed elsewhere. Confirmed live against the real repro: `path "filterRegistry" "useMediaLookups"` has `filterRegistry.ts` (degree 21, the real module) and `filterRegistry.test.ts` (degree 8) tied at the exact same score — a real ambiguity a stable sort tiebreak happened to resolve "correctly" this time, with nothing to say it might not next time. Both endpoints were individually resolving fine in isolation; the misleading result came from silently accepting a coin-flip on one of them. Endpoint resolution now uses the same two-part gate as Graphify-Labs#1613: a close score gap is treated as ambiguous and prints a numbered candidate list (label, source, degree, score) instead of guessing, unless one candidate's degree dominates the runner-up (mirrors Graphify-Labs#1613's degree-dominance escape hatch, applied here to the pre-existing score-gap check rather than tier membership). `--force` bypasses the guard entirely, reproducing pre-Graphify-Labs#1614 behavior. Full suite (2768 tests, same 1 pre-existing unrelated failure as prior commits) and ruff pass. Verified live: the repro now lists both tied candidates instead of silently picking one; `--force` reproduces the old misleading result for comparison; a genuinely unambiguous path (identityResolutionJob -> plexProvider) resolves directly with no false-positive prompt. Scope: this fixes endpoint disambiguation, not the separate issue that a correctly-resolved shortest path can still route through irrelevant test-infrastructure edges when that happens to be graph-theoretically shortest — that's a distinct, unaddressed problem in path's traversal itself, not its endpoint resolution.
nokternol
added a commit
to nokternol/graphify
that referenced
this pull request
Jul 4, 2026
…iling (Graphify-Labs#1616) `graphify explain "<phrase>"` treats its whole argument as one string that must match/prefix/substring a single node's label as a whole — so a genuine natural-language phrase (e.g. "critic score aggregation") returns "No node matching found" even when every individual word exists on a real, relevant node, because no node label ever literally contains the entire multi-word phrase. This silently dead-ends on exactly the query shape `explain` is otherwise suggested for, with no fallback and no signal that anything went wrong (worse than noise: a hard, silent zero). When the tiered lookup finds nothing and the phrase has more than one token, `explain` now falls back to the same per-token bag-of-words scoring `query` already uses (`_score_nodes`) and lists the top candidates by term overlap, in the same numbered-candidate format the existing ambiguity guard (Graphify-Labs#1613) uses, instead of a bare dead end. A genuine single-word miss is unaffected — gated on token count, since a one-word probe would score identically to the substring tier already tried and has nothing new to find. Regression tests: multi-word phrase with real term overlap surfaces candidates and excludes unrelated nodes; multi-word phrase with zero overlap still gets the honest original message; single-word miss is byte-identical to prior behavior. Full suite (2766 tests, 1 pre-existing unrelated failure) and ruff pass. Verified live against a real repo's graph.json: both previously-zero `explain` queries now surface their real target (`ratingsAggregation.ts`, `backdrops.handler.ts`) instead of nothing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Security impact
A corpus can contain symlinks that resolve outside the selected scan root. Previously, direct symlinked directories could be followed automatically and symlinked files could be read by default. If semantic extraction is enabled, those outside-root file contents can enter the extraction prompt or image payload. This patch treats the scan root as a containment boundary before reading or sending file contents.
Tests
python -m pytest tests/test_detect.py tests/test_image_vision.py tests/test_extract.py -q