fix: guard label/text normalizers against None node labels (#1194)#1195
Merged
safishamsi merged 1 commit intoJun 8, 2026
Merged
Conversation
Nodes can carry label=None (OpenAI-compatible LLM backends emit null labels
during semantic extraction). Callers use dict.get("label", fallback), which
returns None for an explicit null value (the fallback only applies when the key
is absent). That None reaches helpers calling unicodedata.normalize(...),
crashing the whole extract pipeline with:
TypeError: normalize() argument 2 must be str, not None
at whichever normalizer runs first (dedup -> build -> export):
- dedup._norm
- build._norm_label
- export._strip_diacritics
- serve._strip_diacritics
Extraction is cached before the build step, so the crash recurs on every
re-run until the cache is wiped, with no --no-dedup escape hatch. Coerce
non-str input to "" at each chokepoint; a null label then normalizes to ""
(already skipped by surrounding 'if key:' guards). Same class as Graphify-Labs#454.
Fixes Graphify-Labs#1194
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Graph nodes can carry
label: None— OpenAI-compatible LLM backends occasionally emit a null label during semantic extraction. The callers fetch the label withnode.get("label", node.get("id", ""))/node.get("label", ""), butdict.get(key, default)returns the default only when the key is absent; for an explicit"label": Noneit returnsNone. ThatNoneflows into helpers that callunicodedata.normalize(...), crashing the entire extract pipeline with:at whichever normalizer runs first (dedup → build → export). The four affected helpers:
dedup._normbuild._norm_labelexport._strip_diacriticsserve._strip_diacriticsBecause semantic extraction results are cached before the build step, the crash recurs on every subsequent
extractuntil the cache is wiped, and there's no--no-dedupescape hatch. Same bug class as #454 (sanitize_labelcrashing on aNonesource_file).The other
unicodedata.normalizecall sites (extract._make_id,build._normalize_id,symbol_resolution._bash_make_id,mcp_ingest) build their input via"_".join(p for p in parts if p), so they're alwaysstr— not affected.Fix
Coerce non-
strinput to""at each chokepoint (and widen the type hint tostr | None). A null/empty label then normalizes to"", which the surroundingif key:guards already skip — so the offending node simply isn't considered for merging (it stays in the graph) instead of aborting the run.Verification
Reproduced on a 6,253-file Markdown corpus via a vLLM /
gpt-oss-120bOpenAI-compatible backend: everyextractcrashed — first atdedup._norm, then (after guarding that) atexport._strip_diacritics. With all four guarded, the same corpus builds cleanly:py_compileclean. Unit check:_norm(None) == "",_norm_label(None) == "", both_strip_diacritics(None) == ""; normal strings unchanged.Fixes #1194