Skip to content

fix(cluster): consolidate stderr suppression into _suppress_output()#1564

Open
OmerFDaskin wants to merge 898 commits into
Graphify-Labs:mainfrom
OmerFDaskin:fix/cluster-suppress-stderr
Open

fix(cluster): consolidate stderr suppression into _suppress_output()#1564
OmerFDaskin wants to merge 898 commits into
Graphify-Labs:mainfrom
OmerFDaskin:fix/cluster-suppress-stderr

Conversation

@OmerFDaskin

Copy link
Copy Markdown

_suppress_output() both stdout and stderr'ı bastırdığını belgeliyordu
ama yalnızca stdout'u yönlendiriyordu. Stderr, çağıranda manuel bir
sys.stderr swap'ıyla yönetiliyordu — exception durumunda finally
öncesi geri yükleme garantisi yoktu.

contextlib.ExitStack + redirect_stderr kullanılarak her iki stream
context manager tarafından temiz şekilde yönetiliyor, çağıran
basitleştiriliyor ve artık kullanılmayan sys import'u kaldırılıyor.

safishamsi and others added 30 commits June 6, 2026 09:40
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bs#1143)

python -m graphify.serve graph.json --transport http --port 8080 serves
the same MCP tools over the Streamable HTTP transport (spec 2025-03-26)
so a single shared process can serve the graph for a whole team.

- _build_server() refactors server registration into a shared factory
  (stdio behavior is byte-for-byte unchanged — all 52 existing tests pass)
- _ApiKeyMiddleware: raw ASGI (not BaseHTTPMiddleware) preserves SSE
  streaming; constant-time compare; RFC-6750 case-insensitive Bearer;
  blank-key normalized to no-auth
- DNS-rebinding protection via TransportSecuritySettings; wildcard binds
  disable it and print an exposure warning when no api-key is set
- session_idle_timeout reaps idle stateful sessions (default 3600s) so a
  long-running shared server does not leak memory on client disconnect
- Dockerfile + .dockerignore for containerized team deployment
- 16 new tests via in-process ASGI test client (importorskip-guarded)
- stdio remains the default; no change for existing setups

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…1155)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s#1159 Graphify-Labs#1107 Graphify-Labs#1103 (graph quality + new features)

Graphify-Labs#1118 — prune stale AST nodes on full re-extraction (Graphify-Labs#1116)
Stamps every AST-extracted node with _origin="ast" in extract(). On a
full rebuild _rebuild_code drops any AST-marked node absent from the
fresh output even when its source file survives, fixing stale symbols.
Backward-compat: marker-less nodes from pre-1118 graphs survive one
cycle then self-heal.

Graphify-Labs#1110 — stop reading images and PDFs as garbage in headless extract
Images route through per-backend vision payloads (base64/data-URI/bytes
for claude/openai/bedrock); non-vision backends get _strip_pixels for
graceful degradation. PDFs reuse pypdf. 5MB cap, 20-image chunk limit.

Graphify-Labs#1159 — Salesforce Apex extractor (.cls, .trigger)
Regex-based extractor: classes, interfaces, enums, methods, triggers,
SOQL/DML edges. No new dependency. Dispatched as .cls and .trigger.

Graphify-Labs#1107 — Azure OpenAI Service backend (--backend azure)
Uses AzureOpenAI SDK client (from existing openai package). Auto-detects
when AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT both set. Uses
max_completion_tokens (not deprecated max_tokens).

Graphify-Labs#1103 — live PostgreSQL introspection (--postgres DSN)
graphify extract --postgres "postgresql://..." introspects tables, views,
routines, and FK relations via information_schema (SERIALIZABLE READ ONLY).
Credentials sanitized on error. New graphify[postgres] extra (psycopg3).

Union-resolved llm.py conflict: Azure functions + bedrock images= param.
Fixed test_image_vision.py mock to accept timeout= kwarg (our Graphify-Labs#1112).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hify-Labs#1160)

Graphify-Labs#1154: scope numpy>=2.0 constraint to python_version>='3.13' only.
numpy 1.26.4 ships no cp313 wheel so uv sync falls back to a source
build requiring a C compiler. The marker avoids forcing numpy 2.x on
3.10-3.12 users who have working 1.x environments.

Graphify-Labs#1160: codex platform skill now installs to .codex/skills/graphify/
instead of .agents/skills/graphify/. The hook already wrote to .codex/
so the skill destination was inconsistent. Propagates automatically
through install/uninstall (both read _PLATFORM_CONFIG dynamically).
Updated all codex-specific test assertions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…(hooks, sensitive filter, score_nodes)

Graphify-Labs#1170 — replace nohup with cross-platform Python detach in git hooks.
Git for Windows MSYS has no nohup so post-commit/post-checkout hooks
silently failed. Now uses subprocess.Popen with DETACHED_PROCESS |
CREATE_NEW_PROCESS_GROUP on Windows, start_new_session=True on POSIX.
Quoting-safe (argv list). Fixes Graphify-Labs#1161.

Graphify-Labs#1169 — fix _is_sensitive false positives on topic-mentioning filenames.
token-economics-of-recall.md and password-policy-discussion.md were
silently dropped as secrets. Generic keywords (token/secret/password)
now only fire when the keyword ends the filename stem or the stem is
≤2 words. Specific patterns (.env/.pem/id_rsa etc.) remain unconditional.

Graphify-Labs#1165 — fix multi-word endpoint resolution in _score_nodes.
graphify path "AuthService" "UserRepo" never fired the exact-match bonus
because per-token comparison never equalled the full label. Now joins
normalized tokens and compares against the full label and its tokenized
form. O(1) per node, affects query_graph and shortest_path uniformly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…st drift (Graphify-Labs#1174 Graphify-Labs#1173 Graphify-Labs#1172 Graphify-Labs#1163)

Graphify-Labs#1174: affected.py load_graph now forces directed=True before
node_link_graph, matching the identical fix in serve.py and __main__.py.
Undirected graphs (directed:false in graph.json) were causing in_edges
to fall back to a direction-blind scan, missing true callers and
reporting false positives. Regression test added.

Graphify-Labs#1173: post-commit and post-checkout hook bodies now read
graphify-out/.graphify_root before calling _rebuild_code, falling back
to Path('.') if absent. A scoped build (graphify src/) no longer gets
silently expanded to the full repo on the next commit. Tests added.

Graphify-Labs#1172: Step 9 cleanup split into rm -f for fixed files and
find -maxdepth 1 -delete for the chunk glob. Under fish/zsh an
unmatched glob aborts the entire rm -f line, leaving temp files on disk.
Fixed in the three skillgen source fragments and regenerated.

Graphify-Labs#1163: detect_incremental type guard on stored mtime — if the manifest
contains a dict-valued mtime (schema drift from older versions), coerce
to None rather than propagating a non-numeric into comparisons.
Regression test added.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds FalkorDB as a sibling option to the existing Neo4j sink, selected via
`graphify export falkordb [--push redis://localhost:6379]`.

- New push_to_falkordb() in graphify/export.py mirrors push_to_neo4j; FalkorDB
  is OpenCypher-compatible so the MERGE/SET upsert queries are identical.
- export falkordb subcommand wired in graphify/__main__.py (cypher.txt when no
  --push, direct push otherwise). Auth is optional; target graph defaults to
  "graphify".
- falkordb optional extra in pyproject.toml (and in the all extra).
- Tests: CLI cypher generation (CI-safe) + real-FalkorDB integration tests that
  skip when no instance is reachable.
- README extras table + command reference and CHANGELOG updated.
…on --update (Graphify-Labs#1178)

Three-part fix:

dedup.py: Pass 1 exact-merge now skips nodes with an empty source_file.
Previously all no-source_file nodes with the same label landed in one
bucket and were merged, destroying distinct symbols (third-party deps,
standalone functions) that happened to share a short name.

update.md (skillgen + all 13 host variants): the --update merge now
passes both deleted AND changed files to prune_sources, mirroring what
watch._rebuild_code already does correctly. Old nodes for re-extracted
files are pruned before fresh AST is inserted — no fuzzy reconciliation
needed, no cross-file collapse possible.

export.py: anti-shrink guard message now names fuzzy dedup as a
possible cause (not only "missing chunk files"), and advises a full
rebuild as the safe recovery path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds graphify skill installation for CodeBuddy (https://www.codebuddy.ai/).
CodeBuddy uses the same agent+hook mechanism as Claude Code.

- graphify codebuddy install — writes ~/.codebuddy/skills/graphify/SKILL.md
  and a CODEBUDDY.md always-on section
- graphify codebuddy uninstall — removes both cleanly
- graphify install --platform codebuddy — same as above
- Registers Bash + Read|Glob PreToolUse hooks in .codebuddy/settings.json
- Full install/uninstall roundtrip tests (35 tests)

Co-authored-by: studyzy <studyzy@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…x README hook description

graphify codebuddy install was writing CODEBUDDY.md and settings.json
but not copying the SKILL.md. Added _copy_skill_file("codebuddy") call
to match the --platform codebuddy path. README hook description updated
from "Glob and Grep" to "Bash search and file reads" to match actual
hook matchers (Bash + Read|Glob).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Labs#1180)

The Agent Skills spec only defines name, description, license,
compatibility, metadata, and allowed-tools as valid frontmatter fields.
The trigger: /graphify line was non-spec, silently ignored by spec-
following hosts, and flagged by agentskills validate CI checks.

- gen.py: removed trigger emission from _render_frontmatter; added
  _is_trigger_line() helper for roundtrip allow-list
- fragments/core/aider.md: removed hardcoded trigger: /graphify
- platforms.toml: removed trigger doc comment and trigger="" entries
- test_skillgen.py: replaced trigger-assertion tests with a single
  test asserting no host has trigger: in frontmatter
- Regenerated all 125 skill artifacts

Routing intent is preserved: the description field already contains
"treated as a graphify query first" and "graphify-out/ exists".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds `graphify-mcp` as a named console script pointing to `graphify.serve:_main`, making the MCP stdio server directly invocable as a first-class CLI command from uv tool / pipx installs. MCP client configs can now use `"command": "graphify-mcp"` instead of `python -m graphify.serve`.

Co-authored-by: jr2804 <jr2804@users.noreply.github.com>
Adds support for the XML-based `.slnx` solution format (VS 2022 17.13+ replacement for `.sln`). Extracts project references as `contains` edges and build dependencies as `imports` edges. XXE-protected XML parsing with size cap. Wired into `_DISPATCH` and `CODE_EXTENSIONS`. 6 new tests passing.

Co-authored-by: bakgaard <bakgaard@users.noreply.github.com>
…/ URI

Makes the FalkorDB option a first-class sibling of Neo4j in the agent skill,
not just the export CLI:

- --falkordb / --falkordb-push shorthands documented in core.md + the shared
  exports.md reference, so they render into all modular platform skills and
  read exactly like --neo4j / --neo4j-push. (The aider/devin monoliths are
  diff-frozen vs v8 by skillgen's roundtrip guard, so they are left untouched.)
- README command reference switched to the /graphify ./raw --falkordb-push form.
- Documented URI scheme is now falkordb://localhost:6379; the scheme is only
  informational (host/port are parsed out), so redis:// or a bare host:port
  remain equivalent. Regenerated skill artifacts + expected/ snapshots.
The no-push 'graphify export falkordb' path advertised
'redis-cli -x GRAPH.QUERY graphify < cypher.txt', but FalkorDB rejects that
with 'query with more than one statement is not supported' - cypher.txt is a
multi-statement Neo4j script. The individual statements ARE valid OpenCypher
(verified by loading them one at a time), only bulk script import is unsupported.

Message + skill docs now say so and point to --push (the verified load path).
…rdb)

The --push/--user/--password export flags feed both the neo4j and falkordb
dispatch branches, so the neo4j_ prefix was misleading - a neo4j_password that
reads FALKORDB_PASSWORD made no sense. Renamed to push_uri/push_user/
push_password, and the password env lookup now reads the backend-specific var
(FALKORDB_PASSWORD for falkordb, NEO4J_PASSWORD otherwise) instead of OR-ing both.
…Graphify-Labs#1197)

Adds extra_body parameter support for custom/OpenAI-compat providers so users can pass provider-specific params (e.g. thinking budget for Claude via Bedrock compat). Adds multi-batch label_communities for 16k-context models — batches multiple community descriptions into a single LLM call instead of one per community. Partial batch failures are handled gracefully.

Co-authored-by: EirikWolf <EirikWolf@users.noreply.github.com>
…Labs#1195)

Guards _norm, _norm_label, and _strip_diacritics against None node labels that cause TypeError in unicodedata.normalize(). Fixes Graphify-Labs#1194. Consistent with existing security.py:270 precedent.

Co-authored-by: freiit <freiit@users.noreply.github.com>
…edup prefix merge

- analyze.py: pass length_bound=max_cycle_length to nx.simple_cycles() so
  networkx prunes during enumeration instead of post-filtering; drops report
  generation from never-returns to ~0.1s on dense graphs (Graphify-Labs#1196)

- llm.py: replace hardcoded min(40+16*n,4096) label_communities token budget
  with _resolve_max_tokens(min(64+24*n,8192)) — 24 tok/community covers 5-word
  JSON entries; 8192 cap fits 16k-context models; env var now honoured (Graphify-Labs#1200)

- dedup.py: add prefix-extension guard in Pass 2 and _llm_tiebreak — skip merge
  when one normalised label is a strict prefix of the other (getActiveSession /
  getActiveSessions, parseConfig / parseConfigFile). Option (a) rejected: dropping
  the >=12 early-out from _short_label_blocked breaks test_typo_merged (Graphify-Labs#1201)

- tests/test_dedup.py: two new regression tests verifying prefix guard fires for
  extension pairs and does not fire for same-length typo pairs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All 27 tree-sitter-* deps were unversioned in pyproject.toml. Users
installing via 'pip install graphifyy' (the README's primary install
path) bypass uv.lock entirely and resolve whatever tree-sitter-*
versions PyPI happens to serve. A breaking minor bump in any grammar
package can land in user installs without notice.

Add explicit lower bounds (matching uv.lock) and upper bounds one
minor above (or one major above for 0.x packages with frequent breaks).
Ranges chosen to allow patch updates without re-pinning while blocking
incompatible major/minor jumps.
1. graphify merge-chunks dumped the entire node list to the terminal
   instead of the node count. __main__.py concatenated merged['nodes']
   (a list of dicts) into an f-string where it clearly meant
   len(merged['nodes']) -- the other two values in the same line use
   len() correctly.

2. global_graph._load_manifest silently returned a fresh empty manifest
   on any JSON parse error. That is reachable through normal interrupted
   writes (the manifest is rewritten in full on every global_add /
   global_remove, with no fsync or atomic rename), and the failure mode
   is total data loss: every tracked repo disappears from
   ~/.graphify/global-manifest.json on the next read.

   Back the corrupt file up to <path>.corrupt.<unix_ts> and print to
   stderr before returning the empty default. Users can then recover
   manually and the failure is visible rather than silent.
bandit, pip-audit, and safety are already declared in the dev dependency
group but nothing in CI invokes them, so a new HIGH-severity finding or
a newly-disclosed CVE in a pinned dep can land without anyone noticing
until the next manual audit.

Add a security-scan job that runs bandit (-ll, HIGH-severity only) and
pip-audit (--strict) on every push and PR. Marked continue-on-error so
this doesn't block PRs on pre-existing findings -- a follow-up should
do the cleanup pass and flip the flag.

safety intentionally omitted: it requires a free-tier API key for the
new commercial backend, which is a setup burden for forks. pip-audit
covers the same ground using the PyPI JSON advisory feed and OSV.
- security.py: replace global socket.getaddrinfo monkey-patch with per-connection
  _SSRFGuardedHTTPConnection/HTTPSConnection subclasses (thread-safe, closes TOCTOU)
- security.py: add GRAPHIFY_MAX_GRAPH_BYTES env var override for 512MB cap (MB/GB suffix
  supported); improve cap error message to cite the env var
- llm.py: wrap untrusted source files in XML delimiters with sha256 fingerprint;
  neutralise jailbreak sentinel tokens to mitigate prompt injection
- dedup.py: skip code nodes in label-based dedup passes; code symbols now deduplicated
  by ID only, preventing distinct same-named symbols from merging
- extract.py: cross-file calls resolution now consults import evidence before bailing
  on ambiguous callee names; emits EXTRACTED edges when named import is unambiguous
- analyze.py: extend _BUILTIN_NOISE_LABELS with stdlib types and modules
- __main__.py: CLAUDE.md template uses MANDATORY language for graphify-first rule;
  PreToolUse hook message hardened to imperative; graphify export html auto-falls
  back to community-aggregation view when graph.json exceeds size cap
- tests/test_pg_introspect.py: add importorskip guard for tree_sitter_sql

Closes Graphify-Labs#1211, Graphify-Labs#1210, Graphify-Labs#1205, Graphify-Labs#1219, Graphify-Labs#1227; resolves discussion Graphify-Labs#1019

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
safishamsi and others added 30 commits June 28, 2026 20:13
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Patch over 0.9.0: completes the node-ID work (fully closes Graphify-Labs#1504 via injective
salt Graphify-Labs#1522), stops origin_file leaking into graph.json (Graphify-Labs#1516), extends cross-file
stub disambiguation to the six dedicated extractors (Graphify-Labs#1515), Java type-param skip
(Graphify-Labs#1518) + record component refs (Graphify-Labs#1519), prunes a deleted import's edge on update
(Graphify-Labs#1521), and retries rate-limited (429) requests instead of dropping chunks (Graphify-Labs#1523).
All non-breaking.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These were committed before .gitignore included the .DS_Store rule, so
gitignore never removed them from tracking. Untrack them (they remain
on local disk, just leave git) — the existing .gitignore rule keeps
them out going forward.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Graphify-Labs#1499)

Resolve Ruby `obj.method()` calls by the inferred type of the receiver
instead of by globally-unique method name. `p = Processor.new; p.run`
now emits a `calls` edge to `Processor#run` and survives name collisions
with unrelated `Worker#run` definitions, where the old name-based match
either resolved by luck or dropped the edge as ambiguous.

Introduces graphify/resolver_registry.py, a behavior-identical
formalization of the existing tail-of-extract() language resolution
passes (Swift Graphify-Labs#1356, Python Graphify-Labs#1446 become registered entries), and
graphify/ruby_resolution.py, its first new consumer. Receiver type is
inferred only from unambiguous local `var = ClassName.new` bindings;
ambiguous or unknown receivers resolve to nothing (no false positives).

Note: Ruby member calls are now excluded from name-based cross-file
resolution and resolved by inferred type only. This is an intentional
precision-over-recall change scoped to Ruby: a cross-file `var.method`
whose receiver type cannot be proven from a local `X.new` binding no
longer resolves by name-luck (it produces no edge rather than a possibly
wrong one), matching the project's confidence model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…phify-Labs#1442)

_call_llm (used by the dedup LLM tiebreaker) built its Anthropic and
OpenAI-compatible clients with max_retries but no timeout, so requests
on this path silently ignored GRAPHIFY_API_TIMEOUT — unlike the primary
extraction paths (_call_openai_compat / _call_claude) which already pass
both. Add timeout=_resolve_api_timeout() to both constructors.

The PR branch self-neutralized: a v8 merge resolved the conflict in
favor of the max_retries-bearing line and dropped the original one-line
fix, so it is re-applied here on top of current v8 with max_retries
preserved. Adds regression coverage for both _call_llm branches, which
were previously untested.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le (Graphify-Labs#1502)

Two cross-platform fixes salvaged from Graphify-Labs#1502:

- to_graphml: nx.write_graphml raises ValueError on None attribute
  values, so a node/edge carrying a null field crashed the export.
  Coerce None -> "" for node and edge attributes before writing.

- save-result: add --answer-file as an alternative to --answer so long
  or multiline answers can be passed via a file instead of a fragile
  inline shell arg (notably Windows/PowerShell quoting). Exactly one of
  --answer / --answer-file is required.

The rest of Graphify-Labs#1502 (a version downgrade and a hand-edited generated
skill-windows.md that fails skillgen --check, plus duplicated
windows-scripts) is left for rework on the PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#1530)

Generated install/skill guidance told agents to invoke a literal `skill`
tool with `skill: "graphify"`, which is host-specific and not valid in
every environment. The always-on AGENTS fragment, packaged artifact,
expected snapshot, and _skill_registration() output now use host-generic
wording: "use the installed graphify skill or instructions". Also decodes
skillgen git blob reads as UTF-8 for Windows and replaces stale English
code-block examples in the translated READMEs.

The always-on roundtrip guard deliberately freezes the v8 baseline, so an
intentional wording change would otherwise fail it. Rather than only
patching the pytest mirror (which left the blocking CLI guard
--always-on-roundtrip red, as the original PR did), this adds an explicit,
reviewable ALWAYS_ON_SANCTIONED_EDITS registry: the guard applies the
approved old->new substitution to the baseline before the byte-for-byte
compare, so this exact sentence is allowed while any other drift still
fails. CLI guard and pytest test now agree and CI passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bot)

Resolve two Dependabot alerts in transitive deps:

- msgpack 1.1.2 -> 1.2.1 (HIGH, GHSA-6v7p-g79w-8964): out-of-bounds
  read / crash on Unpacker reuse after a caught error. Pulled only via
  cachecontrol -> pip-audit (dev group), so not in the published wheel's
  closure, but a fix is available so we take it.
- pydantic-settings 2.14.1 -> 2.14.2 (MEDIUM, GHSA-4xgf-cpjx-pc3j):
  NestedSecretsSettingsSource follows symlinks outside secrets_dir.
  Pulled via mcp (the [mcp]/[all] extra); graphify does not use the
  affected secrets-dir source, but the fix is free.

Lockfile-only; both are transitive. Full suite green (2537 passed),
MCP/serve tests pass on the bumped versions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`safety` was declared in the dev group but never invoked — the CI
security-scan job only runs bandit and pip-audit, and pip-audit already
provides the same dependency-CVE scanning. Its only practical effect was
pulling in nltk, which carries an unpatched HIGH path-traversal advisory
(GHSA-p4gq-832x-fm9v) with no fix available.

Removing safety drops nltk (and safety-schemas/typer/tenacity/tomlkit)
from the lockfile entirely, closing the alert with no loss of coverage.
Updated the stale CI comment that referenced safety. Full suite green
(2537 passed); pip-audit and bandit unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fallbacks (Graphify-Labs#1529, Graphify-Labs#1531)

Graphify-Labs#1529 (regression from the 0.9.0 full-repo-relative node-ID migration):
relative JS/TS imports resolve to repo-relative paths and ride the
extract() id-remap to canonical node IDs, but tsconfig path-alias and
workspace-package imports resolve to ABSOLUTE paths (their bases are
.resolve()'d), so the import-target ID baked in the on-disk prefix and
never matched the repo-relative definition node — the edge was dropped at
build (common on Next.js/SvelteKit `@/`-alias codebases). The id-remap
post-pass now also registers the absolute-resolved form of each input
path (file-level edges) and both the input-form and absolute-form symbol
prefixes (named-import edges), so alias/workspace import targets remap to
the canonical ID. Verified the built graph has no orphan nodes or
dangling edges.

Graphify-Labs#1531: tsconfig `paths` values are ordered fallback lists (tsc tries each
target until one resolves), but only targets[0] was kept. The alias map
now stores all targets in order, and a single _resolve_tsconfig_alias
helper (replacing six duplicated inline loops) returns the first target
whose candidate exists on disk, falling back to the first candidate when
none exist (no false edge). Wildcards, baseUrl, and array `extends` are
preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aphify-Labs#1527)

The AST cache is version-swept but the semantic/LLM cache had no pruning,
so it grew unbounded: it is content-hash-keyed, so every content change
or file deletion leaves a permanent orphan entry (reporter saw 152
entries for 124 live docs). This matters for the committed-cache workflow
where the semantic cache is published for warm CI rebuilds.

Adds prune_semantic_cache(root, live_hashes) and wires it into the end of
the extract path, sweeping cache/semantic/*.json entries whose hash is not
in the live set. The live set is computed from the FULL detected document
set (not the incremental changed-subset, which would delete valid
entries), using the same file_hash recipe save_semantic_cache uses.
Best-effort (unlink guarded), only touches cache/semantic/ (.tmp and
cache/ast/** untouched), and keeps the semantic cache unversioned so
releases never re-bill LLM extraction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…orts map (Graphify-Labs#1308)

Workspace imports with subpath exports (e.g.
`import { x } from "@scope/pkg/browser"`) now resolve through the
package's `exports` map instead of falling back to a bare path. Supports
string values, condition objects, nested conditions, and single-`*`
wildcard patterns (`"./*": "./src/*.js"`), falling back to the existing
bare-path/index resolution when there is no exports map or no match.

Adapted from Graphify-Labs#1541, taking only the exports-map resolver and not that
PR's competing import-node-ID normalization (current v8 already resolves
the node-ID mismatch via the Graphify-Labs#1529 id-remap post-pass, and the PR's
_file_stem approach regressed the relative-input alias case). Two
hardening changes over the original:
- `default` is consulted LAST in the condition priority (it is Node's
  catch-all), so a matching `import`/`module`/`svelte` condition wins.
- Export targets that escape the package directory are rejected
  (`_contained_in_package`), so a malicious `exports` value like
  `"./x": "../../../etc/..."` cannot resolve to a file outside the package.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…c/init refs (Graphify-Labs#1475)

Three residual ObjC extractor bugs from the Graphify-Labs#1475 thread, each reproduced
against the real tree-sitter-objc grammar:

1. NS_ASSUME_NONNULL_BEGIN before @interface made the parser fail to emit
   a class_interface node at all (the whole interface was swallowed into
   ERROR nodes), so headers using the macro produced no class node. Blank
   the two argument-less annotation macros to equal-length spaces before
   parsing (offset-preserving; macro-free files are byte-identical). The
   reporter's "@Class breaks it" hypothesis was wrong — only the macro does.

2. Quoted `#import "X.h"` edges dangled once a `.h`/`.m` pair existed: the
   target used the bare stem, which the post-pass canonicalizes and then
   _disambiguate_colliding_node_ids salts apart by path, so the import
   target no longer matched. Resolve the include to a real file (mirroring
   _import_c), and repoint imports/imports_from edges to the header variant
   in _disambiguate_colliding_node_ids — taking precedence over the
   same-source-file salt so a `.m` importing its own `.h` resolves to the
   header instead of self-looping. Also repairs the equivalent latent
   C-include dangling bug.

3. `[[Foo alloc] init]` produced no edge — walk_calls only reconstructed
   selectors and skipped the receiver. Emit a `references` edge from the
   allocating method to the class, resolved via the unique-class stub guard
   (ensure_named_node + _rewire_unique_stub_nodes) so unknown/ambiguous
   names produce no false edge. The calls-to-init edge is deliberately
   deferred (init selectors are ambiguous across classes).

Reported by JabberYQ with a precise repro and test repo. Adds regression
tests incl. a self-loop guard on the import edges. Still open on Graphify-Labs#1475:
dot-syntax property accesses (Bug 5) and @selector target-action (Bug 6b).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ERRED (Graphify-Labs#1533)

A type-qualified Swift call (`Type.staticMethod()`, `Singleton.shared.method()`)
names the receiver type explicitly in source, so the resolved edge is an exact
reference — now emitted as EXTRACTED (1.0), matching the Python
qualified-class-method pass (_resolve_python_member_calls). Instance calls whose
receiver type comes from local inference (`obj.method()`) stay INFERRED (0.8).
Resolution and the single-definition god-node guard are unchanged.

This addresses the actionable part of Graphify-Labs#1533's "static calls" report: the edge
was always produced (graphify models calls as method->method), it was just
under-confident. Updated the confidence test to assert the instance/type-qualified
split.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Covers Graphify-Labs#1499 (Ruby type-aware resolution), Graphify-Labs#1308/Graphify-Labs#1541 (workspace
exports map), Graphify-Labs#1529 (alias/workspace import-edge regression), Graphify-Labs#1531
(tsconfig paths fallbacks), Graphify-Labs#1527 (semantic cache pruning), Graphify-Labs#1475 (three
ObjC fixes), Graphify-Labs#1533 (Swift static-call confidence), Graphify-Labs#1442 (secondary LLM
timeout), Graphify-Labs#1502 (GraphML null coercion + save-result --answer-file),
Graphify-Labs#1530 (host-generic skill wording), and the Dependabot dep bumps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ruby type-aware member-call resolution and workspace exports-map
resolution, the Graphify-Labs#1529 alias/workspace import-edge regression fix, tsconfig
paths fallbacks, semantic-cache pruning, three ObjC extractor fixes, Swift
static-call confidence, the secondary LLM timeout, GraphML null coercion,
host-generic install wording, and Dependabot dep bumps. See CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extends the tsconfig path-alias resolver (Graphify-Labs#1531) with single-`*` wildcard
capture and substitution: a pattern like `@app/*` or `@*/interfaces`
captures the matched segment and substitutes it into each target in
declared order, honoring baseUrl and tsc's longest-prefix / exact-wins
specificity rules, and preserving Graphify-Labs#1531's first-existing-target-wins
fallback (no false edge when nothing resolves). Builds on the
_resolve_tsconfig_alias helper rather than reintroducing inline loops;
multi-star patterns remain out of scope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…raphify-Labs#1552)

`export * as ns from './mod'` now creates a real symbol node for the
namespace binding `ns`, registers it as a named export (so a downstream
`import { ns }` resolves to it), and emits a file-level `re_exports` edge
to the target module. The binding is treated as a single opaque symbol —
`ns.member` accesses are deliberately NOT expanded into per-member
name-matching, avoiding the over-linking that would fan false edges.
Includes re-export cycle and deep-chain recursion guards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… types (Graphify-Labs#1316)

A member call through a constructor-injected dependency
(`constructor(private db: Database)` ... `this.db.query()`) now produces
a calls edge to the field type's method. The field->type map is captured
from constructor parameter-properties, and resolution reuses the existing
single-definition god-node guard (like the Swift/Python/Ruby member-call
resolvers): the edge is emitted only when the field's type name resolves
to exactly one class definition that owns the method, so an ambiguous or
unknown/untyped field produces no edge — no global name-match fan-out.
Edges are EXTRACTED (the type is explicit from the annotation). TS/JS-only
and additive; scope is constructor parameter-property injection.

Adds the decisive regression tests the implementation needed: two classes
defining the same method name where the injected field is typed to one of
them (must resolve to that one only), and an ambiguous type-name case
(must emit no edge).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…raphify-Labs#1475, Graphify-Labs#1543)

`self.product.name` dot-syntax now emits an `accesses` edge and
`@selector(method)` emits a `calls` edge, both resolved only to an
unambiguous in-scope definition (a sibling method of the same class for
dot-syntax; exactly one method by exact selector name for @selector) so
no false-edge fan-out occurs when multiple classes share a name.

Hardened over the original PR: resolution now matches the method node id
EXACTLY (a method id is _make_id(container, name)) rather than by
`endswith` suffix. The substring match would mis-resolve `self.name` to a
sibling `-surname` (false positive) and, when a substring-colliding
sibling existed, suppress the correct edge (false negative); exact
matching fixes both. Adds substring-collision regression tests
(`-name`/`-surname`, `-doThing`/`-reallyDoThing`).

Completes the Graphify-Labs#1475 ObjC follow-ups (Bug 5 dot-syntax accesses, Bug 6b
@selector target-action).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aph sidecar (Graphify-Labs#1441)

Projects the verdicts `graphify reflect` already distills (preferred /
tentative / contested, exponential time-decayed) into a derived
experiential layer the read surfaces consume, so accumulated agent
experience actually shows up where you look — without polluting the
structural graph.

Design (grounded in agent-memory + provenance literature; a redesign of
the Graphify-Labs#1542 approach):
- SIDECAR, not graph.json stamping. `reflect` writes `.graphify_learning.json`
  next to graph.json (an additional output, so the git hooks produce it
  automatically). graph.json stays purely structural; nothing leaks into
  GraphML; no graph.json churn. Mirrors the named-graph / event-sourcing
  separation of durable truth from a derived layer.
- Reuses the existing reflect aggregate (its `_decay` is the
  recency-weighted exponential model; `_finalize_sources` the
  classification) — no new scoring.
- PROVENANCE: each verdict carries the source questions/dates that produced
  it (cap 5, most-recent first).
- STALENESS: each verdict stores the node's file fingerprint; on read, a
  changed source file flags the verdict stale ("code changed since —
  re-verify") rather than presenting a confident lesson on rewritten code.
- CONTESTED surfaced distinctly (useful N / dead-end M), not averaged away.
- DEAD-ENDS stay QUERY-SCOPED — never a node-level status; they appear only
  in the report as question -> nodes.
- Read surfaces (explain / query+MCP / GRAPH_REPORT / graph.html) merge the
  overlay at read time, sanitized; un-annotated graphs are byte-identical.

Deferred (logged): letting verdicts influence query/seed traversal — the
recommender feedback-loop / Matthew-effect risk means that needs
propensity correction + exploration, not naive biasing.

Builds on the idea in Graphify-Labs#1441/Graphify-Labs#1542 (thanks @TPAteeq).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… paths

The overlay fingerprint resolved a node's source_file against
graph_path.parent (the graphify-out/ dir), but source_file is stored
relative to the PROJECT root — so graphify-out/auth.py never existed and
_is_stale flagged EVERY verdict "code changed since — re-verify" the
moment it was written. (The original staleness test used an absolute
source_file, which masked it.)

Fix: resolve the file by trying the likely roots in order (.graphify_root
marker, graphify-out's parent, graph.json's own dir, cwd) and use the
first that exists — the same search at write and read — and fingerprint
file CONTENT only (sha256 of bytes, no path mixed in) so the hash is
root-independent and a committed sidecar stays valid across checkouts.
Drops the brittle directory-name-based root guess.

Adds a regression test with a relative source_file under the graphify-out
layout (stale=False right after reflect, True after an edit).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
README: document that `reflect --graph` writes the .graphify_learning.json
overlay and that explain/query surface a Lesson hint (with the
code-changed staleness flag).

CHANGELOG: add an Unreleased section for the post-0.9.2 work — the
work-memory overlay (Graphify-Labs#1441/Graphify-Labs#1542), this.field.method() injected-field
resolution (Graphify-Labs#1316), TS wildcard path aliases (Graphify-Labs#1544), JS namespace
re-exports (Graphify-Labs#1552), and the ObjC dot-syntax/@selector edges (Graphify-Labs#1475/Graphify-Labs#1543).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…raphify-Labs#1558)

Refines the staleness file resolution (00e00a0) by folding in the two
genuine merits of @TPAteeq's parallel fix (Graphify-Labs#1558), which independently
and correctly diagnosed the same root-mismatch bug:

- Layout-ordered candidates: try the layout-appropriate root FIRST (the
  graphify-out parent for the standard layout, graph.json's own dir for a
  flat layout) before the other. The prior order tried the grandparent
  first unconditionally, which in a flat layout (graph.json at the project
  root) could fingerprint a same-named file one directory up. Existence
  checking is kept on top, so a defeated name heuristic or a stale
  .graphify_root marker still falls through to the real file.
- Adds @TPAteeq's .graphify_root-marker-driven regression test, plus a
  flat-layout test that pins the ordering (editing the real file flips
  stale; editing the same-named decoy one dir up does not).

Co-Authored-By: tpateeq <mohammedateequddin399@gmail.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Graphify-Labs#1561)

A hyperedge's member list is canonically keyed `nodes`, but producers
(LLM/subagent drift, externally-supplied graph.json) sometimes emit
`members` or `node_ids` — graphify only read `nodes`, so those hyperedges
silently lost their members, and semantic_cleanup's prune dropped them
entirely. Normalize the member key to `nodes` at one ingest chokepoint in
build_from_json (and in semantic_cleanup, which runs pre-build), deduping
and warning, so every downstream consumer sees the canonical key. Mirrors
the existing from/to edge-endpoint aliasing.

Reported by @askalot-io.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ph (Graphify-Labs#1553)

The cross-file call resolver bailed (Graphify-Labs#543/Graphify-Labs#1219 god-node guard) whenever a
bare callee name had 2+ definitions without unique import evidence — so a
single same-named test mock (or any same-named symbol) dropped the real
`calls` edge, erasing the call graph wherever a mock existed (the reporter
saw a 76-stub Pester suite wipe everything).

Replace the blunt bail with a smarter guard: when a name is ambiguous and
import evidence doesn't resolve it, apply tie-breakers — non-test
preference (a shared, segment-aware _is_test_path classifier) then path
proximity — and emit an INFERRED edge ONLY if exactly one candidate
survives, else keep bailing. A real def + a test mock resolves to the real
def; two genuine non-test defs still bail (god-node guard intact, no
fan-out). Wired into both the extract.py pass and the symbol_resolution.py
copy via the shared classifier.

Reported by @Schweinehund.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… routing (Graphify-Labs#1556, Graphify-Labs#1547)

A class declared in a header (Foo.h/@interface) and defined in its impl
(Foo.cpp/Foo.m/@implementation) fragmented into two nodes: _file_stem
drops the extension so Foo.h and Foo.cpp share a node id, which
_disambiguate_colliding_node_ids then split apart by path — and the two
"defs" tripped every resolver's single-definition god-node guard,
cascading into missing .h<->.m/.cpp linkage and cross-file/cross-language
edges.

- Routing: a `.h` using `#import` now routes to extract_objc (Graphify-Labs#1556 bridging
  headers — extract_c drops `#import` as a preproc_call), and a `.h` with
  C++-only signals (class/namespace/template/::/access-specifiers) routes
  to extract_cpp (Graphify-Labs#1547 — the C grammar has no class_specifier, so a C++
  header previously yielded a junk node and lost every method). ObjC sniff
  keeps priority; a plain C header still routes to extract_c.
- Merge: a new _merge_decl_def_classes post-pass collapses the header/impl
  id-collision onto the header (declaration) variant, modeled on
  _merge_swift_extensions, gated so it fires ONLY for a clean sibling
  header/impl pair (same dir, same base stem, exactly one header) — two
  same-named classes in different directories have different stems and
  never collide, so they are never merged (god-node guard verified). C++
  method definitions retain their `Foo::` qualifier so a `Foo::bar` def
  keys onto the header declaration (one method node, not two); free
  functions keep their bare-name ids.

Result: one canonical class node per .h/.m or .h/.cpp pair with methods
unified, which unblocks the existing member-call resolvers (verified
Swift->ObjC calls and Swift `extension` folding now resolve). Strict
improvement over v8 (which produced junk/fragmented nodes here, verified).
Still open as follow-ups: cross-file C++ #include edge resolution and a
C++/ObjC cross-file member-call resolver (a pre-existing gap, not a
regression).

Reported by @JabberYQ (Graphify-Labs#1556) and @c0dezer019 (Graphify-Labs#1547).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aphify-Labs#1547, Graphify-Labs#1556)

Connects paired classes across files: Main.cpp's `Foo f; f.bar()` now resolves
to Foo::bar, and ObjC `Foo *f = [[Foo alloc] init]; [f doThing]` to Foo's
doThing — the "connect with other classes" goal of Graphify-Labs#1547/Graphify-Labs#1556.

Design grounded in prior-art research (ctags qualified-name matching, Doxygen's
name-keyed false-edge failure modes, PAIGE's receiver-type approach, Clang USR):
resolve by RECEIVER TYPE, never bare name, and skip when the type can't be
inferred rather than guess (a false call edge / god-node is worse than a missing
one). Mirrors the existing Swift/Python/Ruby/TS member-call resolvers.

- C++ extractor now captures the member-call receiver (field_expression /
  qualified_identifier / pointer access) and builds a per-file type table from
  local declarations (`Foo f;`, `Foo* f;`, `Foo *f = ...;`); emits raw_calls.
- ObjC extractor emits raw_calls for message sends with the receiver + selector
  and a type table from `Foo *f = ...;` locals (existing in-file selector /
  alloc-init / dot-syntax / @selector matching preserved).
- New _resolve_cpp_member_calls / _resolve_objc_member_calls, registered for
  their suffixes. Receiver tiers: `Foo::bar()` / capitalized ObjC receiver and
  this/self/super (enclosing class) -> EXTRACTED; local-var-typed -> INFERRED.
  Single-definition god-node guard (skip unless exactly one type def matches);
  the just-shipped decl/def class merge makes a paired class one def so the
  guard resolves it. Verified: a.run() -> A::run only (not a same-named B::run);
  an uninferable receiver with run() in two classes emits zero edges (no
  fan-out); ObjC [f doThing] -> Foo only.
- build.py: the cross-language INFERRED-call prune treated .h/.cpp/.m as
  different families and dropped header/impl interop calls; unified the C family
  (.c .h .cc .cpp .hpp .cxx .hh .hxx .cu .cuh .metal .m .mm) so a .cpp/.m call to
  a .h-declared method survives.

Still open (tracked on Graphify-Labs#1547/Graphify-Labs#1556): the file-level `#include` edge can stay
uncanonicalized when the project root isn't symlink-resolved (the extract()
id-remap `continue`s on a /var-vs-/private/var mismatch) — the class connection
above is robust to it; include-reachability candidate narrowing and ObjC
dynamic-dispatch/id-typed receivers also deferred (expected low ObjC recall, per
the research).

Reported by @c0dezer019 (Graphify-Labs#1547) and @JabberYQ (Graphify-Labs#1556).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_suppress_output() documented that it suppressed both stdout and stderr
but only redirected stdout. Stderr was handled by a manual sys.stderr
swap in the caller, which is less safe (no guarantee of restoration on
exception before the try/finally). Use contextlib.ExitStack +
redirect_stderr so both streams are handled by the context manager and
the caller is simplified. Removes the now-unused sys import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.