Skip to content

feat(vectordb): adding qdrant vector db support #137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vanhtuan0409
Copy link

Adding support for Qdrant as vector db backend

Copy link

prophet-code-review-bot bot commented May 6, 2025

Bug Report

Name Severity Example test case Description
Typo in get_chunks_by_id function in QdrantVectorStore High Call get_chunks_by_id with a list of valid chunk identifiers when using Qdrant as the vector store. The function will fail to retrieve the correct chunks. There is a typo in the get_chunks_by_id function in core/vector_store/qdrant_store.py. The loop variable chunk_numder is misspelled, leading to incorrect point ID generation and failure to retrieve the correct chunks.

Comments? Email us.

Copy link

cubic-dev-ai bot commented May 6, 2025

Your mrge subscription is currently inactive. Please reactivate your subscription to receive AI reviews and use mrge.

Copy link

jazzberry-ai bot commented May 6, 2025

Bug Report

Name: Inadequate validation of QDRANT_PORT in configuration
Severity: Medium
Example test case: Set QDRANT_PORT to a non-integer value in the morphik.toml or as an environment variable.
Description: The application uses pydantic to validate the QDRANT_PORT setting. While this ensures that the port is an integer, the resulting validation error is not very descriptive. It would be better to explicitly check if QDRANT_PORT is an integer and raise a more informative error message if it is not.

Comments? Email us.

@CLAassistant
Copy link

CLAassistant commented May 6, 2025

CLA assistant check
All committers have signed the CLA.

@vanhtuan0409 vanhtuan0409 force-pushed the feat/qdrant-vectordb branch from 8e12225 to 44470d1 Compare May 6, 2025 05:21
Copy link

prophet-code-review-bot bot commented May 6, 2025

Bug Report

Name Severity Example test case Description
Missing Graceful Shutdown Medium Start and stop the application multiple times. The AsyncQdrantClient is not explicitly closed on application shutdown, potentially leading to resource leaks.
Redundant environment variable loading in config Low Set different values for the same config in .env and morphik.toml The code first loads environment variables from .env using load_dotenv(override=True) and then overrides them with values from the TOML config file.
No Healthcheck for Qdrant in original docker-compose file Low Run docker-compose up and check if qdrant restarts on failure. The original docker-compose.minimal.yml didn't have a healthcheck defined. This could prevent docker from restarting the service automatically.

Comments? Email us.

Copy link

jazzberry-ai bot commented May 6, 2025

Bug Report

Name Severity Example test case Description
Inconsistent Vector Size in Qdrant Collection Critical 1. Configure VECTOR_DIMENSIONS to 1024. 2. Create a Qdrant collection with vector size 128. 3. Run the application. If the vector size in the existing Qdrant collection does not match the configured VECTOR_DIMENSIONS, the code logs an error but doesn't recreate the collection or raise an exception. This can lead to data corruption.

Comments? Email us.

@vanhtuan0409 vanhtuan0409 force-pushed the feat/qdrant-vectordb branch from 44470d1 to 7970d12 Compare May 6, 2025 09:18
Copy link

prophet-code-review-bot bot commented May 6, 2025

Bug Report

Name Severity Example test case Description
Missing Qdrant configuration check during settings loading High Set VECTOR_STORE_PROVIDER to "qdrant" in morphik.toml but omit qdrant_host. Start the application. The application should raise an error during settings loading if VECTOR_STORE_PROVIDER is set to "qdrant" but QDRANT_HOST is not provided. Currently, the check only exists in core/api.py, leading to a delayed failure.

Comments? Email us.

Copy link

jazzberry-ai bot commented May 6, 2025

Bug Report

Name Severity Example test case Description
Missing Input Validation in query_similar Medium Call query_similar with k=0 or k=-1. The query_similar method doesn't validate the k parameter, which can lead to errors if k is not a positive integer.
Potential Injection Vulnerability in Metadata Handling Low Store metadata containing unsanitized user input. The json.dumps used for metadata could potentially lead to injection vulnerabilities if the metadata contains unsanitized user-provided data.
Unclosed Qdrant Client Medium Run the application for an extended period. The AsyncQdrantClient is not explicitly closed, which can lead to resource leaks.
Lack of Asynchronous Context Management Low Cause an exception during a Qdrant operation. The QdrantVectorStore methods don't use async with for interacting with AsyncQdrantClient, potentially leading to less robust error handling.

Comments? Email us.

@vanhtuan0409 vanhtuan0409 force-pushed the feat/qdrant-vectordb branch from 7970d12 to 386c9e6 Compare May 6, 2025 09:45
Copy link

prophet-code-review-bot bot commented May 6, 2025

Bug Report

Name Severity Example test case Description
Docker check Medium Run start_server.py on a system without Docker installed. The check_and_start_redis() function in start_server.py doesn't check if Docker is installed before attempting to run Docker commands, leading to a crash if Docker is missing.
Missing data migration Low Switch the vector store from PGVector to Qdrant There is no data migration mechanism when switching vector store providers, leading to data loss.
Undocumented argument Low Run start_server.py --help The --skip-redis-setup flag is not documented in the README.

Comments? Email us.

Copy link

jazzberry-ai bot commented May 6, 2025

Bug Report

No bugs found.

Comments? Email us.

Copy link

prophet-code-review-bot bot commented May 6, 2025

Bug Report

Name Severity Example test case Description
AttributeError in _check_collection_vector_size Medium Call _check_collection_vector_size on a Qdrant collection where the vectors attribute in the collection configuration is None. The _check_collection_vector_size method in QdrantVectorStore assumes that the vectors attribute of the collection configuration is always present. If this attribute is None, the code throws an AttributeError. This can happen if the collection was created outside of this code or if there's an issue with the Qdrant setup. A check for None should be added before accessing the size attribute.

Comments? Email us.

Copy link

jazzberry-ai bot commented May 6, 2025

Bug Report

Name Severity Example test case Description
Incorrect Document ID Filtering in Qdrant Queries High Call query_similar with a list of document IDs. The query_similar function in QdrantVectorStore uses models.MatchAny with raw document IDs, but the document_id field in Qdrant is defined as UUID. This type mismatch causes incorrect filtering or query failures.

Comments? Email us.

@vanhtuan0409 vanhtuan0409 force-pushed the feat/qdrant-vectordb branch from 143d24a to f725450 Compare May 7, 2025 05:07
Copy link

prophet-code-review-bot bot commented May 7, 2025

Bug Report

Name Severity Example test case Description
Qdrant query_similar with empty doc_ids Medium Call query_similar with doc_ids=[] The query_similar method in QdrantVectorStore might fail if called with an empty list of doc_ids, as the MatchAny filter in Qdrant expects a non-empty list. This can be addressed by adding a check for an empty doc_ids list and skipping the filter in that case.

Comments? Email us.

Copy link

jazzberry-ai bot commented May 7, 2025

Bug Report

Name Severity Example test case Description
Missing Qdrant Cloud Authentication High Use Qdrant Cloud without setting an API key The AsyncQdrantClient is initialized without an API key, which is required for Qdrant Cloud.
Missing try-except block in _check_collection_vector_size Medium If the collection does not exist, this can cause an uncaught exception The _check_collection_vector_size method does not have a try except block.
Incorrect UUID Namespace Medium Use a document_id that contains a domain name The _to_point_id function uses uuid.NAMESPACE_DNS which is designed for domain names and could lead to collisions.

Comments? Email us.

# Conflicts:
#    core/api.py
# Conflicts:
#	core/api.py
# Conflicts:
#    core/api.py
# Conflicts:
#    core/api.py
@vanhtuan0409 vanhtuan0409 force-pushed the feat/qdrant-vectordb branch from 062e751 to 4b151a2 Compare May 12, 2025 03:22
@@ -106,13 +106,16 @@ storage_path = "./storage"

[vector_store]
provider = "pgvector"
# provider = "qdrant"
qdrant_host = "localhost"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should also be commented out

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the provider is not set to Qdrant, those values will not be use. So I think there is no harm just to leave it there.

Otherwise the config struct must be changed to optional

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants