Skip to content

Improve collection handling and hashing consistency#203

Merged
m1rl0k merged 5 commits intotestfrom
bubble-bullshit
Jan 26, 2026
Merged

Improve collection handling and hashing consistency#203
m1rl0k merged 5 commits intotestfrom
bubble-bullshit

Conversation

@m1rl0k
Copy link
Collaborator

@m1rl0k m1rl0k commented Jan 26, 2026

Added robust parsing for SCHEMA_CACHE_TTL in indexing_admin.py to handle invalid environment values. Updated upload_service.py to better support demo mode and fallback to Qdrant collections when authentication is disabled. Modified watch_index_core/processor.py to use xxhash for file hashing for consistency with the pipeline, with a fallback to hashlib if xxhash is unavailable.

Added robust parsing for SCHEMA_CACHE_TTL in indexing_admin.py to handle invalid environment values. Updated upload_service.py to better support demo mode and fallback to Qdrant collections when authentication is disabled. Modified watch_index_core/processor.py to use xxhash for file hashing for consistency with the pipeline, with a fallback to hashlib if xxhash is unavailable.
@augmentcode
Copy link

augmentcode bot commented Jan 26, 2026

🤖 Augment PR Summary

Summary: Improves robustness around schema-cache TTL parsing, admin collection handling in demo mode, and watcher file hashing consistency.

Changes:

  • Hardened SCHEMA_CACHE_TTL_SECS parsing by defaulting invalid/non-finite/non-positive values to 300s.
  • Updated admin collection status/stream endpoints to fall back to direct Qdrant collection listing when auth is disabled (demo mode) or reports disabled.
  • Aligned watcher file hashing with the ingest pipeline by using xxhash, with a hashlib fallback when unavailable/erroring.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Added validation for SCHEMA_CACHE_TTL_SECS to ensure it is finite and positive, defaulting to 300 seconds if invalid. Enhanced file hash computation to fallback to hashlib.sha1 on any xxhash runtime error, improving robustness.
@m1rl0k
Copy link
Collaborator Author

m1rl0k commented Jan 26, 2026

augment review

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Removed fallback to hashlib and now require xxhash for hashing file contents in _read_text_and_sha1. This ensures consistency with other scripts and simplifies the code by eliminating error handling for missing xxhash.
Renamed _read_text_and_sha1 to _read_text_and_hash and updated its implementation and usage to compute xxhash64 instead of SHA1 for consistency with the pipeline. Updated related test to check for the correct hash length.
Introduces optional LZ4 compression for JSON data stored in Redis, reducing memory usage with minimal CPU overhead. Updates requirements.txt to include lz4 as a dependency and modifies workspace_state.py to compress data before storing and decompress on retrieval, falling back gracefully if lz4 is unavailable.
@m1rl0k m1rl0k merged commit 5053491 into test Jan 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant