fix: persist cluster-only analysis sidecar#1617
Closed
sanmaxdev wants to merge 1 commit into
Closed
Conversation
safishamsi
added a commit
that referenced
this pull request
Jul 2, 2026
Collaborator
|
Merged into |
nokternol
added a commit
to nokternol/graphify
that referenced
this pull request
Jul 4, 2026
…hify-Labs#1617) _score_nodes' "joined" full-query tier exists so a multi-word query that equals (or prefixes) a whole multi-word label wins outright, since no single token in a bag-of-words sum could otherwise equal that label. For a single-token probe, this degenerates: `joined` equals the lone term, and any node whose *tokenized* label (punctuation stripped) happens to reduce to exactly that one word - e.g. a bare method call like `.search()`, whose only word character content is "search" - gets promoted to the EXACT tier via the `label_tokens` comparison, even though the same node correctly fails the per-token loop's own raw `t == norm_label or t == bare_label` exact check a few lines below (raw ".search" != "search"). This matters most inside `_pick_seeds`' per-term seed-diversity guarantee (Graphify-Labs#1445), which probes each distinct query term in isolation via `_score_nodes(G, [term])`: a short, same-named method repeated across several unrelated files (three metadata providers each define their own `.search()`) can win that single-term probe's EXACT tier outright and starve out the actually-relevant multi-word file, which only reaches the PREFIX tier for the same bare word. Reproduced live: `graphify query "how does a change in provider settings affect what shows up in search results"` seeded on one provider's unrelated `.search()` method and never surfaced `search.handler.ts` at all, despite that file scoring far higher (494 vs 7) under the query's full multi-word sentence - the bug is specific to the single-term isolation probe, not the combined-query scoring path. Fix: gate the joined-tier block on `len(norm_terms) > 1`. A single-token probe has no "multi-word phrase vs per-token bag-of-words" distinction to make in the first place - the per-token loop directly below already fully and correctly handles single-term exact/prefix/substring matching via raw, non-tokenized label comparison, so the bonus is both redundant and (as shown) actively harmful when only one token is being scored. The combined multi-word query path is unchanged, since len(norm_terms) > 1 there. Regression tests: an isolated single-token probe now ranks the real multi-word file above the same-named bare method; `_pick_seeds`' per-term diversity guarantee no longer seeds the bare method over the relevant file end-to-end. Full suite (2766 tests, 1 pre-existing unrelated failure) and ruff pass. Verified live: search.handler.ts and its exported symbols now appear in the traversal for the exact query that previously missed them entirely.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.graphify_analysis.jsonduringgraphify cluster-only/graphify labelruns.graph.json.Testing
uv run --frozen pytest tests/test_cli_export.py -q --tb=shortuv run --frozen pytest tests/test_cli_export.py tests/test_serve.py -q --tb=shortuv run --frozen pytest tests/ -q --tb=shortuv run --frozen ruff check graphify/__main__.py tests/test_cli_export.pyuv run --frozen python -m tools.skillgen --checkuv run --frozen python -m tools.skillgen --audit-coverageuv run --frozen python -m tools.skillgen --schema-singletonuv run --frozen python -m tools.skillgen --monolith-roundtripuv run --frozen python -m tools.skillgen --always-on-roundtripuv run --frozen graphify --helpuv run --frozen graphify installgit diff --checkCloses #1610