Summary
detect.detect_incremental() crashes with TypeError: '>' not supported between instances of 'float' and 'dict' when the on-disk manifest.json contains dict-valued entries, because the reader assumes a flat {path: mtime_float} schema and compares an mtime float directly against the stored value. This aborts the entire --update flow.
Traceback
File ".../graphify/detect.py", line 456, in detect_incremental
if stored_mtime is None or current_mtime > stored_mtime:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'float' and 'dict'
(seen on graphifyy==0.3.23, Python 3.12)
Root cause
save_manifest() (detect.py:416) writes manifest[f] = Path(f).stat().st_mtime -> flat {path: float}. load_manifest() (detect.py:408) does a bare json.loads with no schema validation. If a manifest.json written by a different schema version (dict-valued, e.g. {path: {"mtime": ..., "hash": ...}}) is on disk, stored_mtime = manifest.get(f) returns a dict, and current_mtime > stored_mtime at line 456 raises TypeError. There is no schema-version stamp and no type guard, so format drift between writer and reader is a hard crash mid-comparison rather than a graceful "treat as new."
load_manifest's try/except -> {} only catches invalid JSON (which safely degrades to "everything new"); it does not catch valid-JSON-wrong-value-shape, which slips through to the comparison.
Proposed fix
- Type-guard the stored value at detect.py:451-456:
stored_mtime = manifest.get(f)
if isinstance(stored_mtime, dict): # tolerate richer/legacy schema
stored_mtime = stored_mtime.get("mtime")
if not isinstance(stored_mtime, (int, float)):
stored_mtime = None # unknown shape -> treat as new
- Stamp a
schema_version in save_manifest and check it in load_manifest, so reader/writer drift fails loud-and-graceful (full re-extract) instead of crashing.
Secondary issue (manifest staleness -> over-reporting)
Even with a valid flat manifest, detect_incremental over-reports when the manifest is incomplete relative to what detect() currently classifies: every file missing from the manifest re-trips stored_mtime is None and is flagged "new" each run. This happens to any consumer that runs the incremental detect/extract/merge pipeline without calling save_manifest() at the end of the pass (only the full pipeline's final step re-saves it). Consider documenting that incremental consumers must call save_manifest(detect_result["files"]) after a successful merge, or have detect_incremental optionally refresh the manifest itself.
Happy to send a PR for (1)+(2) if useful.
Summary
detect.detect_incremental()crashes withTypeError: '>' not supported between instances of 'float' and 'dict'when the on-diskmanifest.jsoncontains dict-valued entries, because the reader assumes a flat{path: mtime_float}schema and compares an mtime float directly against the stored value. This aborts the entire--updateflow.Traceback
(seen on
graphifyy==0.3.23, Python 3.12)Root cause
save_manifest()(detect.py:416) writesmanifest[f] = Path(f).stat().st_mtime-> flat{path: float}.load_manifest()(detect.py:408) does a barejson.loadswith no schema validation. If amanifest.jsonwritten by a different schema version (dict-valued, e.g.{path: {"mtime": ..., "hash": ...}}) is on disk,stored_mtime = manifest.get(f)returns a dict, andcurrent_mtime > stored_mtimeat line 456 raisesTypeError. There is no schema-version stamp and no type guard, so format drift between writer and reader is a hard crash mid-comparison rather than a graceful "treat as new."load_manifest'stry/except -> {}only catches invalid JSON (which safely degrades to "everything new"); it does not catch valid-JSON-wrong-value-shape, which slips through to the comparison.Proposed fix
schema_versioninsave_manifestand check it inload_manifest, so reader/writer drift fails loud-and-graceful (full re-extract) instead of crashing.Secondary issue (manifest staleness -> over-reporting)
Even with a valid flat manifest,
detect_incrementalover-reports when the manifest is incomplete relative to whatdetect()currently classifies: every file missing from the manifest re-tripsstored_mtime is Noneand is flagged "new" each run. This happens to any consumer that runs the incremental detect/extract/merge pipeline without callingsave_manifest()at the end of the pass (only the full pipeline's final step re-saves it). Consider documenting that incremental consumers must callsave_manifest(detect_result["files"])after a successful merge, or havedetect_incrementaloptionally refresh the manifest itself.Happy to send a PR for (1)+(2) if useful.