Fix: Prevent Qdrant 404 errors in CI by handling missing collections gracefully by PrakharJain1509 · Pull Request #9 · rexdivakar/HippocampAI

PrakharJain1509 · 2025-10-27T17:44:00Z

Summary

Fixes #7 - Resolves Qdrant 404 errors in CI tests when collections don't exist.

Problem

The CI pipeline fails on tests like test_batch_add_memories and test_batch_delete_memories due to UnexpectedResponse: 404 (Not Found) when running inside GitHub Actions. This happens because tests call query_points() or scroll() before creating the collection.

Solution

Added collection existence checks to all Qdrant operations following Command-Query Separation principle:

Read Operations (no side effects)

search() - returns empty list if collection doesn't exist
scroll() - returns empty list if collection doesn't exist
get() - returns None if collection doesn't exist
delete() - gracefully skips if collection doesn't exist
update() - returns False if collection doesn't exist

Write Operations (can have side effects)

upsert() - creates collection if it doesn't exist

Benefits

✅ Prevents 404 errors in CI
✅ Tests pass with clean Qdrant state
✅ Follows Command-Query Separation principle
✅ Graceful handling of missing collections
✅ Better error detection - doesn't mask bugs

Testing

Resolves failing CI tests
Handles empty Qdrant state gracefully
No breaking changes to existing functionality

Summary by Sourcery

Prevent Qdrant 404 errors in CI by gracefully handling missing collections in QdrantStore

Bug Fixes:

Prevent 404 errors during CI tests by handling missing Qdrant collections

Enhancements:

Add collection existence checks to search, scroll, get, delete, and update methods to return defaults when collections don’t exist
Update upsert to lazily create collections if missing
Parameterize ensure_collections to support targeting a single collection

sourcery-ai · 2025-10-27T17:44:06Z

Reviewer's Guide

Adds graceful handling for missing Qdrant collections by refactoring the collection-initialization logic and injecting existence checks across all vector operations to eliminate 404 failures in CI.

Sequence diagram for Qdrant operation with collection existence check

sequenceDiagram
    participant Client
    participant QdrantStore
    participant Qdrant
    Client->>QdrantStore: upsert(collection_name, ...)
    QdrantStore->>QdrantStore: check if collection exists
    alt Collection does not exist
        QdrantStore->>QdrantStore: _ensure_collections(collection_name)
        QdrantStore->>Qdrant: create_collection
        QdrantStore->>Qdrant: create_payload_index (user_id, type, tags)
    end
    QdrantStore->>Qdrant: upsert

    Client->>QdrantStore: search(collection_name, ...)
    QdrantStore->>QdrantStore: check if collection exists
    alt Collection does not exist
        QdrantStore->>Client: return []
    else Collection exists
        QdrantStore->>Qdrant: search
        Qdrant->>QdrantStore: results
        QdrantStore->>Client: results
    end

Class diagram for updated QdrantStore methods

classDiagram
    class QdrantStore {
        - client
        - collection_facts
        - collection_prefs
        - dimension
        - hnsw_m
        - ef_construction
        + _ensure_collections(collection_name: Optional[str] = None)
        + upsert(collection_name: str, id: str, vector: np.ndarray, payload: Dict[str, Any])
        + search(collection_name: str, ...): List[Dict[str, Any]]
        + scroll(collection_name: str, ...): List[Dict[str, Any]]
        + delete(collection_name: str, ids: List[str])
        + get(collection_name: str, id: str): Optional[Dict[str, Any]]
        + update(collection_name: str, id: str, payload: Dict[str, Any]): bool
    }
    QdrantStore --> QdrantClient : uses
    QdrantStore --> VectorParams
    QdrantStore --> HnswConfigDiff
    QdrantStore --> OptimizersConfigDiff

Flow diagram for QdrantStore operation handling missing collections

flowchart TD
    A["Operation called (search, upsert, scroll, get, delete, update)"] --> B{Does collection exist?}
    B -- No --> C["For upsert: create collection via _ensure_collections"]
    B -- No --> D["For read ops: return empty/None/False"]
    B -- Yes --> E["Proceed with Qdrant operation"]

File-Level Changes

Change	Details	Files
Refactor collection initialization to support optional, on-demand creation	Add optional `collection_name` parameter to `_ensure_collections` Construct `collections_to_ensure` dynamically based on input Use `coll_name` instead of `collection_name` within creation loops Adjust logging to report the actual collection created	`src/hippocampai/vector/qdrant_store.py`
Pre-check and auto-create missing collections in upsert	Check `collection_exists` at start of `upsert` Log a warning and call `_ensure_collections` when the collection is absent	`src/hippocampai/vector/qdrant_store.py`
Gracefully handle absent collections in search, scroll, delete, get, and update	Return empty lists in `search` and `scroll` if the collection is missing Skip and return early in `delete` when collection doesn’t exist Return `None` in `get` for missing collections Return `False` in `update` when the collection is absent	`src/hippocampai/vector/qdrant_store.py`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#7	Prevent Qdrant 404 errors in CI tests by ensuring that collection existence is checked before performing operations.	✅
#7	Ensure that tests pass in CI with a clean Qdrant state (no pre-existing collections).	✅

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

PrakharJain1509 · 2025-10-27T17:44:30Z

@rexdivakar Check if its fine now?

qodo-code-review · 2025-10-27T17:44:31Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance

🟢

No security concerns identified

No security vulnerabilities detected by AI analysis. Human verification advised for critical code.

Ticket Compliance

🟡

🎫 #7

🟢	Prevent Qdrant 404 errors in CI caused by querying non-existent collections.
	Ensure tests like test_batch_add_memories and test_batch_delete_memories pass in a clean CI environment.
	Handle operations like query_points/query (search) and scroll safely when collections are missing.
	Do not rely on pre-existing local collections; create when needed or return empty results appropriately.
⚪	Keep CI unblocked by removing the root cause instead of disabling pytest.

Codebase Duplication Compliance

⚪

Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance

🟢

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

⚪

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: New branches that change behavior on missing collections emit warnings but do not produce
structured audit entries capturing actor, action, and outcome for critical
read/write/delete operations.

Referred Code

    # Ensure collection exists before upserting
    if not self.client.collection_exists(collection_name):
        logger.warning(f"Collection {collection_name} does not exist. Creating it.")
        self._ensure_collections(collection_name)

    self.client.upsert(
        collection_name=collection_name,
        points=[
            PointStruct(
                id=id,
                vector=vector.tolist() if isinstance(vector, np.ndarray) else vector,
                payload=payload,
            )
        ],
    )

@get_qdrant_retry_decorator(max_attempts=3, min_wait=1, max_wait=5)
def search(
    self,
    collection_name: str,
    vector: np.ndarray,


 ... (clipped 126 lines)

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Generic error log: The catch-all exception in get() logs only a generic message without operation context
(collection name, id) and returns None, which may hinder debugging of edge cases.

Referred Code

    return None
except Exception as e:
    logger.error(f"Failed to get point {id}: {e}")
    return None

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Unvalidated inputs: New paths accept external parameters like collection names and IDs without explicit
validation or sanitization before using them in datastore operations.

Referred Code

@get_qdrant_retry_decorator(max_attempts=3, min_wait=1, max_wait=5)
def upsert(self, collection_name: str, id: str, vector: np.ndarray, payload: Dict[str, Any]):
    """Insert or update a point (with automatic retry on transient failures)."""
    # Ensure collection exists before upserting
    if not self.client.collection_exists(collection_name):
        logger.warning(f"Collection {collection_name} does not exist. Creating it.")
        self._ensure_collections(collection_name)

    self.client.upsert(
        collection_name=collection_name,
        points=[
            PointStruct(
                id=id,
                vector=vector.tolist() if isinstance(vector, np.ndarray) else vector,
                payload=payload,
            )
        ],
    )

@get_qdrant_retry_decorator(max_attempts=3, min_wait=1, max_wait=5)
def search(


 ... (clipped 98 lines)

Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consider consolidating the repeated collection-exists checks into a decorator or shared helper to reduce code duplication across methods.
Logging missing collections as warnings for expected control flow may clutter logs—consider using debug-level logging or adding a metric instead.
Multiple client.collection_exists calls per operation could introduce latency; you might cache collection existence state or batch-check to improve performance.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider consolidating the repeated collection-exists checks into a decorator or shared helper to reduce code duplication across methods.
- Logging missing collections as warnings for expected control flow may clutter logs—consider using debug-level logging or adding a metric instead.
- Multiple client.collection_exists calls per operation could introduce latency; you might cache collection existence state or batch-check to improve performance.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

qodo-code-review · 2025-10-27T17:45:31Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Handle exceptions instead of preemptive checks Instead of checking if a collection exists before each operation, attempt the operation directly and handle the `404` exception if the collection is not found. This avoids an extra network call in the common case where the collection exists. Examples: src/hippocampai/vector/qdrant_store.py [119-122] # If collection doesn't exist, return empty results without creating it. if not self.client.collection_exists(collection_name): logger.warning(f"Collection {collection_name} does not exist. Returning empty list.") return [] src/hippocampai/vector/qdrant_store.py [165-168] # If collection doesn't exist, return empty results without creating it. if not self.client.collection_exists(collection_name): logger.warning(f"Collection {collection_name} does not exist. Returning empty list.") return [] Solution Walkthrough: Before: # Using 'search' as an example of the current "Look Before You Leap" (LBYL) approach def search(self, collection_name, vector, limit=100, ...): # Preemptive check introduces an extra network call if not self.client.collection_exists(collection_name): logger.warning(f"Collection {collection_name} does not exist. Returning empty list.") return [] # Operation is performed only after the check results = self.client.search( collection_name=collection_name, ... ) return [...] After: # Using 'search' as an example of the suggested "Easier to Ask Forgiveness than Permission" (EAFP) approach from qdrant_client.http.exceptions import UnexpectedResponse def search(self, collection_name, vector, limit=100, ...): try: # Optimistically attempt the operation directly results = self.client.search( collection_name=collection_name, ... ) return [...] except UnexpectedResponse as e: # Handle the specific case where the collection does not exist if e.status_code == 404: logger.warning(f"Collection {collection_name} does not exist. Returning empty list.") return [] # Re-raise other unexpected errors raise Suggestion importance[1-10]: 9 __ Why: This is a significant architectural suggestion that correctly identifies a performance issue (extra network calls) and offers a more robust and Pythonic "EAFP" pattern, which also avoids potential race conditions.	High
Possible issue	✅ ~~Avoid race condition during creation~~ Suggestion Impact: The commit removed the collection_exists check and used create_collection directly inside a try block to avoid race conditions. It then created payload indices upon successful creation and adjusted logging accordingly. This aligns with the suggestion's intent, though it used exception handling rather than a returned "created" flag. code diff: collections_to_ensure = [collection_name] if collection_name else [self.collection_facts, self.collection_prefs] for coll_name in collections_to_ensure: - if not self.client.collection_exists(coll_name): + # Use idempotent create_collection to avoid race conditions + # If collection exists with same params, this is a no-op + try: self.client.create_collection( collection_name=coll_name, vectors_config=VectorParams(size=self.dimension, distance=Distance.COSINE), @@ -67,8 +70,7 @@ optimizers_config=OptimizersConfigDiff(indexing_threshold=20000), wal_config=WalConfigDiff(wal_capacity_mb=32), ) - - # Create payload indices + # Collection was created, set up indices self.client.create_payload_index( collection_name=coll_name, field_name="user_id", @@ -84,8 +86,10 @@ field_name="tags", field_schema=PayloadSchemaType.KEYWORD, ) - - logger.info(f"Created collection: {coll_name}") + logger.info(f"Created collection and indices: {coll_name}") + except Exception as e: + # Collection already exists or other error occurred + logger.debug(f"Collection {coll_name} creation skipped: {e}") To prevent a race condition in `_ensure_collections`, remove the `collection_exists` check and rely on the idempotency of `create_collection`. Conditionally create payload indices based on the return value of `create_collection`. src/hippocampai/vector/qdrant_store.py [59-88] collections_to_ensure = [collection_name] if collection_name else [self.collection_facts, self.collection_prefs] for coll_name in collections_to_ensure: - if not self.client.collection_exists(coll_name): - self.client.create_collection( - collection_name=coll_name, - vectors_config=VectorParams(size=self.dimension, distance=Distance.COSINE), - hnsw_config=HnswConfigDiff(m=self.hnsw_m, ef_construct=self.ef_construction), - optimizers_config=OptimizersConfigDiff(indexing_threshold=20000), - wal_config=WalConfigDiff(wal_capacity_mb=32), - ) + # The create_collection call is idempotent. If the collection already exists + # with the same parameters, this call will do nothing and not raise an error. + # This avoids a race condition where multiple processes might check for + # existence and then all try to create the collection. + created = self.client.create_collection( + collection_name=coll_name, + vectors_config=VectorParams(size=self.dimension, distance=Distance.COSINE), + hnsw_config=HnswConfigDiff(m=self.hnsw_m, ef_construct=self.ef_construction), + optimizers_config=OptimizersConfigDiff(indexing_threshold=20000), + wal_config=WalConfigDiff(wal_capacity_mb=32), + ) - # Create payload indices + if created: + # Create payload indices only when the collection is newly created self.client.create_payload_index( collection_name=coll_name, field_name="user_id", field_schema=PayloadSchemaType.KEYWORD, ) self.client.create_payload_index( collection_name=coll_name, field_name="type", field_schema=PayloadSchemaType.KEYWORD, ) self.client.create_payload_index( collection_name=coll_name, field_name="tags", field_schema=PayloadSchemaType.KEYWORD, ) + logger.info(f"Created collection and indices: {coll_name}") - logger.info(f"Created collection: {coll_name}") - `[Suggestion processed]` Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a potential race condition in the `_ensure_collections` method and proposes a robust solution by removing the `collection_exists` check and leveraging the idempotent nature of `create_collection`, which improves the code's reliability in a concurrent environment.	Medium
Update

PrakharJain1509 · 2025-10-27T17:51:29Z

@rexdivakar Let me know if anything else is required

- Use exception handling (EAFP) instead of preemptive checks in search() and scroll() - Eliminates extra network call (collection_exists) in common case - Avoid race conditions in _ensure_collections using try-except - More Pythonic and efficient approach - Improves performance by removing unnecessary checks

Copilot

Pull Request Overview

This PR fixes Qdrant 404 errors in CI by adding collection existence checks to all database operations. The changes implement Command-Query Separation where read operations return empty/default values when collections don't exist, while write operations (upsert) create collections lazily.

Key Changes:

Added collection existence checks to read operations (search, scroll, get, delete, update) to handle missing collections gracefully
Modified upsert to create collections lazily if they don't exist
Parameterized _ensure_collections to support creating individual collections on-demand

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/hippocampai/vector/qdrant_store.py

rexdivakar · 2025-10-27T17:59:52Z

@rexdivakar Let me know if anything else is required

Will do a full validation and merge the changes.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

PrakharJain1509 · 2025-10-27T18:02:18Z

@rexdivakar Let me know if anything else is required

Will do a full validation and merge the changes.

Sure..

PrakharJain1509 added 2 commits October 27, 2025 22:54

Check-1

8033789

Check-2

ac131e9

qodo-code-review bot added the Review effort 2/5 label Oct 27, 2025

sourcery-ai bot reviewed Oct 27, 2025

View reviewed changes

rexdivakar requested a review from Copilot October 27, 2025 17:58

Copilot AI reviewed Oct 27, 2025

View reviewed changes

src/hippocampai/vector/qdrant_store.py Show resolved Hide resolved

src/hippocampai/vector/qdrant_store.py Show resolved Hide resolved

src/hippocampai/vector/qdrant_store.py Outdated Show resolved Hide resolved

PrakharJain1509 and others added 3 commits October 27, 2025 23:30

Update src/hippocampai/vector/qdrant_store.py

3c728f4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/hippocampai/vector/qdrant_store.py

a7925c2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'patch_1' into patch_1

9df5e95

rexdivakar merged commit 1534c2e into rexdivakar:patch_1 Oct 27, 2025

Comments

Conversation

PrakharJain1509 commented Oct 27, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Read Operations (no side effects)

Write Operations (can have side effects)

Benefits

Testing

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for Qdrant operation with collection existence check

Class diagram for updated QdrantStore methods

Flow diagram for QdrantStore operation handling missing collections

File-Level Changes

Assessment against linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

PrakharJain1509 commented Oct 27, 2025

Uh oh!

qodo-code-review bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

PrakharJain1509 commented Oct 27, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rexdivakar commented Oct 27, 2025

Uh oh!

PrakharJain1509 commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PrakharJain1509 commented Oct 27, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 27, 2025 •

edited

Loading

qodo-code-review bot commented Oct 27, 2025 •

edited

Loading

qodo-code-review bot commented Oct 27, 2025 •

edited

Loading