Contain symlinked extraction inputs by Tok6Flow0 · Pull Request #1613 · Graphify-Labs/graphify

Tok6Flow0 · 2026-07-02T18:02:10Z

Summary

make symlink directory following explicit opt-in instead of auto-enabled
require resolved corpus paths to stay under the scan root before detection, AST collection, and LLM/image extraction reads
skip out-of-root symlink targets while preserving in-root symlink support

Security impact

A corpus can contain symlinks that resolve outside the selected scan root. Previously, direct symlinked directories could be followed automatically and symlinked files could be read by default. If semantic extraction is enabled, those outside-root file contents can enter the extraction prompt or image payload. This patch treats the scan root as a containment boundary before reading or sending file contents.

Tests

python -m pytest tests/test_detect.py tests/test_image_vision.py tests/test_extract.py -q
- 256 passed

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

safishamsi · 2026-07-02T21:54:44Z

Merged into v8 as 009a98b (your authorship). Verified the containment: an in-corpus symlink whose target resolves outside the scan root is skipped at detection (and the AST/LLM read boundaries), while in-root files extract normally and symlink-following stays explicit opt-in. Good security hardening. Full suite 2841. Ships next release.

…ify-Labs#1613) `explain` resolved to `_find_node`'s first match unconditionally, discarding every other candidate with no signal — the exact failure mode Graphify-Labs#1445 fixed for `query`'s BFS seeding, present all along in `explain`'s own resolution step but never addressed. Confirmed live: `explain "genres"` silently returned an unrelated, degree-1 Storybook literal while `GenresResponse` (the actually relevant node per this repo's own Trial 1 ground truth) sat one precedence tier down, discarded — not a same-tier tie but a lone weak exact match masking a real prefix match, since `_find_node`'s tiers apply strict precedence (exact beats prefix beats substring) with nothing to catch a weak winner. `_find_node`'s internal tiers are now exposed via `_find_node_tiers` (source_exact, exact, prefix, substring, plus whether the existing file-level-node heuristic already resolved a multi-entry source_exact tier). `explain` uses this to catch two ambiguity shapes instead of picking `matches[0]` blindly: - same-tier ties: more than one candidate in the top precedence tier. - precedence collapse: exactly one candidate in the top tier, but it is weakly connected (degree <= 1) and a lower tier has candidates that are fully hidden by tier precedence alone. Either case prints a numbered candidate list (label, source, degree) instead of guessing, and exits cleanly; `--force` reproduces the old behavior for scripts that want it. A degree-dominance escape hatch (mirrors `path`'s existing top-vs-runner-up gap check, applied to degree) keeps same-tier ties from over-firing: when one candidate's degree dominates the runner-up, it resolves directly with no prompt, found necessary during live verification against a real repo where a DI container type name legitimately appears as 3 distinct exact-tier nodes (the real definition plus 2 unrelated per-file type-annotation references) — a real, dominant winner, not a genuine tie. Full suite (2764 tests, same 1 pre-existing unrelated failure as prior commits) and ruff pass. Verified live against a real repo's graph.json: `explain "genres"` now surfaces both candidates; `explain "Cradle"` and `explain "identityResolutionJob"` (previously-clean cases) are unaffected; `explain "identity linking"` (a genuine no-match) is unaffected.

…raphify-Labs#1614) Port of Graphify-Labs#1613's `explain` fix into `path`'s endpoint resolution. `path` already detected a close top-vs-runner-up score gap on either endpoint, but only printed a stderr warning and then silently proceeded with the top-scored candidate regardless — the same silently-wrong-with-no-actionable-signal failure Graphify-Labs#1445 and Graphify-Labs#1613 fixed elsewhere. Confirmed live against the real repro: `path "filterRegistry" "useMediaLookups"` has `filterRegistry.ts` (degree 21, the real module) and `filterRegistry.test.ts` (degree 8) tied at the exact same score — a real ambiguity a stable sort tiebreak happened to resolve "correctly" this time, with nothing to say it might not next time. Both endpoints were individually resolving fine in isolation; the misleading result came from silently accepting a coin-flip on one of them. Endpoint resolution now uses the same two-part gate as Graphify-Labs#1613: a close score gap is treated as ambiguous and prints a numbered candidate list (label, source, degree, score) instead of guessing, unless one candidate's degree dominates the runner-up (mirrors Graphify-Labs#1613's degree-dominance escape hatch, applied here to the pre-existing score-gap check rather than tier membership). `--force` bypasses the guard entirely, reproducing pre-Graphify-Labs#1614 behavior. Full suite (2768 tests, same 1 pre-existing unrelated failure as prior commits) and ruff pass. Verified live: the repro now lists both tied candidates instead of silently picking one; `--force` reproduces the old misleading result for comparison; a genuinely unambiguous path (identityResolutionJob -> plexProvider) resolves directly with no false-positive prompt. Scope: this fixes endpoint disambiguation, not the separate issue that a correctly-resolved shortest path can still route through irrelevant test-infrastructure edges when that happens to be graph-theoretically shortest — that's a distinct, unaddressed problem in path's traversal itself, not its endpoint resolution.

…iling (Graphify-Labs#1616) `graphify explain "<phrase>"` treats its whole argument as one string that must match/prefix/substring a single node's label as a whole — so a genuine natural-language phrase (e.g. "critic score aggregation") returns "No node matching found" even when every individual word exists on a real, relevant node, because no node label ever literally contains the entire multi-word phrase. This silently dead-ends on exactly the query shape `explain` is otherwise suggested for, with no fallback and no signal that anything went wrong (worse than noise: a hard, silent zero). When the tiered lookup finds nothing and the phrase has more than one token, `explain` now falls back to the same per-token bag-of-words scoring `query` already uses (`_score_nodes`) and lists the top candidates by term overlap, in the same numbered-candidate format the existing ambiguity guard (Graphify-Labs#1613) uses, instead of a bare dead end. A genuine single-word miss is unaffected — gated on token count, since a one-word probe would score identically to the substring tier already tried and has nothing new to find. Regression tests: multi-word phrase with real term overlap surfaces candidates and excludes unrelated nodes; multi-word phrase with zero overlap still gets the honest original message; single-word miss is byte-identical to prior behavior. Full suite (2766 tests, 1 pre-existing unrelated failure) and ruff pass. Verified live against a real repo's graph.json: both previously-zero `explain` queries now surface their real target (`ratingsAggregation.ts`, `backdrops.handler.ts`) instead of nothing.

Contain symlinked extraction inputs

d8e3ba0

safishamsi added a commit that referenced this pull request Jul 2, 2026

docs(changelog): note the #1617/#1607/#1615/#1613 fixes

41ce87e

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

safishamsi closed this Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Contain symlinked extraction inputs#1613

Contain symlinked extraction inputs#1613
Tok6Flow0 wants to merge 1 commit into
Graphify-Labs:v8from
Tok6Flow0:codex/contain-symlink-extraction

Tok6Flow0 commented Jul 2, 2026

Uh oh!

safishamsi commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

Tok6Flow0 commented Jul 2, 2026

Summary

Security impact

Tests

Uh oh!

safishamsi commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants