Skip to content

detect_incremental crashes with TypeError ('float' > 'dict') on schema-drifted manifest.json; over-reports on stale manifest #1163

Description

@edgyezy

Summary

detect.detect_incremental() crashes with TypeError: '>' not supported between instances of 'float' and 'dict' when the on-disk manifest.json contains dict-valued entries, because the reader assumes a flat {path: mtime_float} schema and compares an mtime float directly against the stored value. This aborts the entire --update flow.

Traceback

File ".../graphify/detect.py", line 456, in detect_incremental
    if stored_mtime is None or current_mtime > stored_mtime:
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'float' and 'dict'

(seen on graphifyy==0.3.23, Python 3.12)

Root cause

save_manifest() (detect.py:416) writes manifest[f] = Path(f).stat().st_mtime -> flat {path: float}. load_manifest() (detect.py:408) does a bare json.loads with no schema validation. If a manifest.json written by a different schema version (dict-valued, e.g. {path: {"mtime": ..., "hash": ...}}) is on disk, stored_mtime = manifest.get(f) returns a dict, and current_mtime > stored_mtime at line 456 raises TypeError. There is no schema-version stamp and no type guard, so format drift between writer and reader is a hard crash mid-comparison rather than a graceful "treat as new."

load_manifest's try/except -> {} only catches invalid JSON (which safely degrades to "everything new"); it does not catch valid-JSON-wrong-value-shape, which slips through to the comparison.

Proposed fix

  1. Type-guard the stored value at detect.py:451-456:
    stored_mtime = manifest.get(f)
    if isinstance(stored_mtime, dict):       # tolerate richer/legacy schema
        stored_mtime = stored_mtime.get("mtime")
    if not isinstance(stored_mtime, (int, float)):
        stored_mtime = None                  # unknown shape -> treat as new
  2. Stamp a schema_version in save_manifest and check it in load_manifest, so reader/writer drift fails loud-and-graceful (full re-extract) instead of crashing.

Secondary issue (manifest staleness -> over-reporting)

Even with a valid flat manifest, detect_incremental over-reports when the manifest is incomplete relative to what detect() currently classifies: every file missing from the manifest re-trips stored_mtime is None and is flagged "new" each run. This happens to any consumer that runs the incremental detect/extract/merge pipeline without calling save_manifest() at the end of the pass (only the full pipeline's final step re-saves it). Consider documenting that incremental consumers must call save_manifest(detect_result["files"]) after a successful merge, or have detect_incremental optionally refresh the manifest itself.

Happy to send a PR for (1)+(2) if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions