Skip to content

Comments

Fix: Prevent Qdrant 404 errors in CI by handling missing collections gracefully#9

Merged
rexdivakar merged 6 commits intorexdivakar:patch_1from
PrakharJain1509:patch_1
Oct 27, 2025
Merged

Fix: Prevent Qdrant 404 errors in CI by handling missing collections gracefully#9
rexdivakar merged 6 commits intorexdivakar:patch_1from
PrakharJain1509:patch_1

Conversation

@PrakharJain1509
Copy link
Contributor

@PrakharJain1509 PrakharJain1509 commented Oct 27, 2025

Summary

Fixes #7 - Resolves Qdrant 404 errors in CI tests when collections don't exist.

Problem

The CI pipeline fails on tests like test_batch_add_memories and test_batch_delete_memories due to UnexpectedResponse: 404 (Not Found) when running inside GitHub Actions. This happens because tests call query_points() or scroll() before creating the collection.

Solution

Added collection existence checks to all Qdrant operations following Command-Query Separation principle:

Read Operations (no side effects)

  • search() - returns empty list if collection doesn't exist
  • scroll() - returns empty list if collection doesn't exist
  • get() - returns None if collection doesn't exist
  • delete() - gracefully skips if collection doesn't exist
  • update() - returns False if collection doesn't exist

Write Operations (can have side effects)

  • upsert() - creates collection if it doesn't exist

Benefits

  1. ✅ Prevents 404 errors in CI
  2. ✅ Tests pass with clean Qdrant state
  3. ✅ Follows Command-Query Separation principle
  4. ✅ Graceful handling of missing collections
  5. ✅ Better error detection - doesn't mask bugs

Testing

  • Resolves failing CI tests
  • Handles empty Qdrant state gracefully
  • No breaking changes to existing functionality

Summary by Sourcery

Prevent Qdrant 404 errors in CI by gracefully handling missing collections in QdrantStore

Bug Fixes:

  • Prevent 404 errors during CI tests by handling missing Qdrant collections

Enhancements:

  • Add collection existence checks to search, scroll, get, delete, and update methods to return defaults when collections don’t exist
  • Update upsert to lazily create collections if missing
  • Parameterize ensure_collections to support targeting a single collection

@sourcery-ai
Copy link

sourcery-ai bot commented Oct 27, 2025

Reviewer's Guide

Adds graceful handling for missing Qdrant collections by refactoring the collection-initialization logic and injecting existence checks across all vector operations to eliminate 404 failures in CI.

Sequence diagram for Qdrant operation with collection existence check

sequenceDiagram
    participant Client
    participant QdrantStore
    participant Qdrant
    Client->>QdrantStore: upsert(collection_name, ...)
    QdrantStore->>QdrantStore: check if collection exists
    alt Collection does not exist
        QdrantStore->>QdrantStore: _ensure_collections(collection_name)
        QdrantStore->>Qdrant: create_collection
        QdrantStore->>Qdrant: create_payload_index (user_id, type, tags)
    end
    QdrantStore->>Qdrant: upsert

    Client->>QdrantStore: search(collection_name, ...)
    QdrantStore->>QdrantStore: check if collection exists
    alt Collection does not exist
        QdrantStore->>Client: return []
    else Collection exists
        QdrantStore->>Qdrant: search
        Qdrant->>QdrantStore: results
        QdrantStore->>Client: results
    end
Loading

Class diagram for updated QdrantStore methods

classDiagram
    class QdrantStore {
        - client
        - collection_facts
        - collection_prefs
        - dimension
        - hnsw_m
        - ef_construction
        + _ensure_collections(collection_name: Optional[str] = None)
        + upsert(collection_name: str, id: str, vector: np.ndarray, payload: Dict[str, Any])
        + search(collection_name: str, ...): List[Dict[str, Any]]
        + scroll(collection_name: str, ...): List[Dict[str, Any]]
        + delete(collection_name: str, ids: List[str])
        + get(collection_name: str, id: str): Optional[Dict[str, Any]]
        + update(collection_name: str, id: str, payload: Dict[str, Any]): bool
    }
    QdrantStore --> QdrantClient : uses
    QdrantStore --> VectorParams
    QdrantStore --> HnswConfigDiff
    QdrantStore --> OptimizersConfigDiff
Loading

Flow diagram for QdrantStore operation handling missing collections

flowchart TD
    A["Operation called (search, upsert, scroll, get, delete, update)"] --> B{Does collection exist?}
    B -- No --> C["For upsert: create collection via _ensure_collections"]
    B -- No --> D["For read ops: return empty/None/False"]
    B -- Yes --> E["Proceed with Qdrant operation"]
Loading

File-Level Changes

Change Details Files
Refactor collection initialization to support optional, on-demand creation
  • Add optional collection_name parameter to _ensure_collections
  • Construct collections_to_ensure dynamically based on input
  • Use coll_name instead of collection_name within creation loops
  • Adjust logging to report the actual collection created
src/hippocampai/vector/qdrant_store.py
Pre-check and auto-create missing collections in upsert
  • Check collection_exists at start of upsert
  • Log a warning and call _ensure_collections when the collection is absent
src/hippocampai/vector/qdrant_store.py
Gracefully handle absent collections in search, scroll, delete, get, and update
  • Return empty lists in search and scroll if the collection is missing
  • Skip and return early in delete when collection doesn’t exist
  • Return None in get for missing collections
  • Return False in update when the collection is absent
src/hippocampai/vector/qdrant_store.py

Assessment against linked issues

Issue Objective Addressed Explanation
#7 Prevent Qdrant 404 errors in CI tests by ensuring that collection existence is checked before performing operations.
#7 Ensure that tests pass in CI with a clean Qdrant state (no pre-existing collections).

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@PrakharJain1509
Copy link
Contributor Author

@rexdivakar Check if its fine now?

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Oct 27, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🟡
🎫 #7
🟢 Prevent Qdrant 404 errors in CI caused by querying non-existent collections.
Ensure tests like test_batch_add_memories and test_batch_delete_memories pass in a clean
CI environment.
Handle operations like query_points/query (search) and scroll safely when collections are
missing.
Do not rely on pre-existing local collections; create when needed or return empty results
appropriately.
Keep CI unblocked by removing the root cause instead of disabling pytest.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: New branches that change behavior on missing collections emit warnings but do not produce
structured audit entries capturing actor, action, and outcome for critical
read/write/delete operations.

Referred Code
    # Ensure collection exists before upserting
    if not self.client.collection_exists(collection_name):
        logger.warning(f"Collection {collection_name} does not exist. Creating it.")
        self._ensure_collections(collection_name)

    self.client.upsert(
        collection_name=collection_name,
        points=[
            PointStruct(
                id=id,
                vector=vector.tolist() if isinstance(vector, np.ndarray) else vector,
                payload=payload,
            )
        ],
    )

@get_qdrant_retry_decorator(max_attempts=3, min_wait=1, max_wait=5)
def search(
    self,
    collection_name: str,
    vector: np.ndarray,


 ... (clipped 126 lines)
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Generic error log: The catch-all exception in get() logs only a generic message without operation context
(collection name, id) and returns None, which may hinder debugging of edge cases.

Referred Code
    return None
except Exception as e:
    logger.error(f"Failed to get point {id}: {e}")
    return None
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Unvalidated inputs: New paths accept external parameters like collection names and IDs without explicit
validation or sanitization before using them in datastore operations.

Referred Code
@get_qdrant_retry_decorator(max_attempts=3, min_wait=1, max_wait=5)
def upsert(self, collection_name: str, id: str, vector: np.ndarray, payload: Dict[str, Any]):
    """Insert or update a point (with automatic retry on transient failures)."""
    # Ensure collection exists before upserting
    if not self.client.collection_exists(collection_name):
        logger.warning(f"Collection {collection_name} does not exist. Creating it.")
        self._ensure_collections(collection_name)

    self.client.upsert(
        collection_name=collection_name,
        points=[
            PointStruct(
                id=id,
                vector=vector.tolist() if isinstance(vector, np.ndarray) else vector,
                payload=payload,
            )
        ],
    )

@get_qdrant_retry_decorator(max_attempts=3, min_wait=1, max_wait=5)
def search(


 ... (clipped 98 lines)
  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider consolidating the repeated collection-exists checks into a decorator or shared helper to reduce code duplication across methods.
  • Logging missing collections as warnings for expected control flow may clutter logs—consider using debug-level logging or adding a metric instead.
  • Multiple client.collection_exists calls per operation could introduce latency; you might cache collection existence state or batch-check to improve performance.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider consolidating the repeated collection-exists checks into a decorator or shared helper to reduce code duplication across methods.
- Logging missing collections as warnings for expected control flow may clutter logs—consider using debug-level logging or adding a metric instead.
- Multiple client.collection_exists calls per operation could introduce latency; you might cache collection existence state or batch-check to improve performance.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Oct 27, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Handle exceptions instead of preemptive checks

Instead of checking if a collection exists before each operation, attempt the
operation directly and handle the 404 exception if the collection is not found.
This avoids an extra network call in the common case where the collection
exists.

Examples:

src/hippocampai/vector/qdrant_store.py [119-122]
        # If collection doesn't exist, return empty results without creating it.
        if not self.client.collection_exists(collection_name):
            logger.warning(f"Collection {collection_name} does not exist. Returning empty list.")
            return []
src/hippocampai/vector/qdrant_store.py [165-168]
        # If collection doesn't exist, return empty results without creating it.
        if not self.client.collection_exists(collection_name):
            logger.warning(f"Collection {collection_name} does not exist. Returning empty list.")
            return []

Solution Walkthrough:

Before:

# Using 'search' as an example of the current "Look Before You Leap" (LBYL) approach
def search(self, collection_name, vector, limit=100, ...):
    # Preemptive check introduces an extra network call
    if not self.client.collection_exists(collection_name):
        logger.warning(f"Collection {collection_name} does not exist. Returning empty list.")
        return []

    # Operation is performed only after the check
    results = self.client.search(
        collection_name=collection_name,
        ...
    )
    return [...]

After:

# Using 'search' as an example of the suggested "Easier to Ask Forgiveness than Permission" (EAFP) approach
from qdrant_client.http.exceptions import UnexpectedResponse

def search(self, collection_name, vector, limit=100, ...):
    try:
        # Optimistically attempt the operation directly
        results = self.client.search(
            collection_name=collection_name,
            ...
        )
        return [...]
    except UnexpectedResponse as e:
        # Handle the specific case where the collection does not exist
        if e.status_code == 404:
            logger.warning(f"Collection {collection_name} does not exist. Returning empty list.")
            return []
        # Re-raise other unexpected errors
        raise
Suggestion importance[1-10]: 9

__

Why: This is a significant architectural suggestion that correctly identifies a performance issue (extra network calls) and offers a more robust and Pythonic "EAFP" pattern, which also avoids potential race conditions.

High
Possible issue
Avoid race condition during creation
Suggestion Impact:The commit removed the collection_exists check and used create_collection directly inside a try block to avoid race conditions. It then created payload indices upon successful creation and adjusted logging accordingly. This aligns with the suggestion's intent, though it used exception handling rather than a returned "created" flag.

code diff:

         collections_to_ensure = [collection_name] if collection_name else [self.collection_facts, self.collection_prefs]
         
         for coll_name in collections_to_ensure:
-            if not self.client.collection_exists(coll_name):
+            # Use idempotent create_collection to avoid race conditions
+            # If collection exists with same params, this is a no-op
+            try:
                 self.client.create_collection(
                     collection_name=coll_name,
                     vectors_config=VectorParams(size=self.dimension, distance=Distance.COSINE),
@@ -67,8 +70,7 @@
                     optimizers_config=OptimizersConfigDiff(indexing_threshold=20000),
                     wal_config=WalConfigDiff(wal_capacity_mb=32),
                 )
-
-                # Create payload indices
+                # Collection was created, set up indices
                 self.client.create_payload_index(
                     collection_name=coll_name,
                     field_name="user_id",
@@ -84,8 +86,10 @@
                     field_name="tags",
                     field_schema=PayloadSchemaType.KEYWORD,
                 )
-
-                logger.info(f"Created collection: {coll_name}")
+                logger.info(f"Created collection and indices: {coll_name}")
+            except Exception as e:
+                # Collection already exists or other error occurred
+                logger.debug(f"Collection {coll_name} creation skipped: {e}")

To prevent a race condition in _ensure_collections, remove the collection_exists
check and rely on the idempotency of create_collection. Conditionally create
payload indices based on the return value of create_collection.

src/hippocampai/vector/qdrant_store.py [59-88]

 collections_to_ensure = [collection_name] if collection_name else [self.collection_facts, self.collection_prefs]
 
 for coll_name in collections_to_ensure:
-    if not self.client.collection_exists(coll_name):
-        self.client.create_collection(
-            collection_name=coll_name,
-            vectors_config=VectorParams(size=self.dimension, distance=Distance.COSINE),
-            hnsw_config=HnswConfigDiff(m=self.hnsw_m, ef_construct=self.ef_construction),
-            optimizers_config=OptimizersConfigDiff(indexing_threshold=20000),
-            wal_config=WalConfigDiff(wal_capacity_mb=32),
-        )
+    # The create_collection call is idempotent. If the collection already exists
+    # with the same parameters, this call will do nothing and not raise an error.
+    # This avoids a race condition where multiple processes might check for
+    # existence and then all try to create the collection.
+    created = self.client.create_collection(
+        collection_name=coll_name,
+        vectors_config=VectorParams(size=self.dimension, distance=Distance.COSINE),
+        hnsw_config=HnswConfigDiff(m=self.hnsw_m, ef_construct=self.ef_construction),
+        optimizers_config=OptimizersConfigDiff(indexing_threshold=20000),
+        wal_config=WalConfigDiff(wal_capacity_mb=32),
+    )
 
-        # Create payload indices
+    if created:
+        # Create payload indices only when the collection is newly created
         self.client.create_payload_index(
             collection_name=coll_name,
             field_name="user_id",
             field_schema=PayloadSchemaType.KEYWORD,
         )
         self.client.create_payload_index(
             collection_name=coll_name,
             field_name="type",
             field_schema=PayloadSchemaType.KEYWORD,
         )
         self.client.create_payload_index(
             collection_name=coll_name,
             field_name="tags",
             field_schema=PayloadSchemaType.KEYWORD,
         )
+        logger.info(f"Created collection and indices: {coll_name}")
 
-        logger.info(f"Created collection: {coll_name}")
-

[Suggestion processed]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential race condition in the _ensure_collections method and proposes a robust solution by removing the collection_exists check and leveraging the idempotent nature of create_collection, which improves the code's reliability in a concurrent environment.

Medium
  • Update

@PrakharJain1509
Copy link
Contributor Author

@rexdivakar Let me know if anything else is required

- Use exception handling (EAFP) instead of preemptive checks in search() and scroll()
- Eliminates extra network call (collection_exists) in common case
- Avoid race conditions in _ensure_collections using try-except
- More Pythonic and efficient approach
- Improves performance by removing unnecessary checks
@rexdivakar rexdivakar requested a review from Copilot October 27, 2025 17:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes Qdrant 404 errors in CI by adding collection existence checks to all database operations. The changes implement Command-Query Separation where read operations return empty/default values when collections don't exist, while write operations (upsert) create collections lazily.

Key Changes:

  • Added collection existence checks to read operations (search, scroll, get, delete, update) to handle missing collections gracefully
  • Modified upsert to create collections lazily if they don't exist
  • Parameterized _ensure_collections to support creating individual collections on-demand

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@rexdivakar
Copy link
Owner

@rexdivakar Let me know if anything else is required

Will do a full validation and merge the changes.

PrakharJain1509 and others added 3 commits October 27, 2025 23:30
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@PrakharJain1509
Copy link
Contributor Author

@rexdivakar Let me know if anything else is required

Will do a full validation and merge the changes.

Sure..

@rexdivakar rexdivakar merged commit 1534c2e into rexdivakar:patch_1 Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants