Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS DocumentDB similar_search returns error 'Invalid type for vector' #27417

Open
5 tasks done
YoninL opened this issue Oct 17, 2024 · 0 comments
Open
5 tasks done

AWS DocumentDB similar_search returns error 'Invalid type for vector' #27417

YoninL opened this issue Oct 17, 2024 · 0 comments
Labels
investigate Ɑ: vector store Related to vector store module

Comments

@YoninL
Copy link

YoninL commented Oct 17, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

I've setup an AWS DocumentDB cluster, using pure pymongo code works for vector search, but using langchain not. I have no clues why, could you help me?

Working code:

import pymongo

client = pymongo.MongoClient(
"mongodb://docdb-2024-10-17-02-26-57.cluster-xxx.us-west-2.docdb.amazonaws.com:27017/",
port=27017,
username="testadmin",
password="testpass",
retryWrites=False,
tls='true',
tlsCAFile="/global-bundle.pem")
db = client.testdb
collection = db['testcollection']

embedding=[
    -0.06659489870071411,
    0.021658368408679962,
    ...
    0.027635671198368073,
    -0.018259789794683456
  ] # 1024 dimensions

docs = collection.aggregate([{'$search': {"vectorSearch" : {"vector" : embedding, "path": "embedding", "similarity": "cosine", "k": 2}}}])
result = [doc['text'] for doc in docs]

print(result)
# related doc is returned without errors

Not working code:

from langchain_community.vectorstores import DocumentDBVectorSearch

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
                model_name="models/BAAI/bge-m3",
                model_kwargs={'device': 'cpu'},
                encode_kwargs={'normalize_embeddings': True}
            )

vectorstore = DocumentDBVectorSearch.from_connection_string(
                connection_string="mongodb://testadmin:testpass@docdb-2024-10-17-02-26-57.cluster-xxx.us-west-2.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=/global-bundle.pem",
                namespace="testdb.testcollection",
                embedding=embeddings,
                index_name='vector_index'
                )

# sample_embedding = embeddings.embed_query("Test query")
# print(type(sample_embedding), sample_embedding)

docs = vectorstore.similarity_search("Test query")

print(docs)

The error I got (see full stack trace below):

pymongo.errors.OperationFailure: Invalid type for vector, full error: {'ok': 0.0, 'operationTime': Timestamp(1729159273, 1), 'code': 9, 'errmsg': 'Invalid type for vector'}

Error Message and Stack Trace (if applicable)

  client: MongoClient = MongoClient(connection_string)
Traceback (most recent call last):
  File "/app/test.py", line 22, in <module>
    docs = vectorstore.similarity_search("Test queryl")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/langchain_community/vectorstores/documentdb.py", line 369, in similarity_search
    docs = self._similarity_search_without_score(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/langchain_community/vectorstores/documentdb.py", line 349, in _similarity_search_without_score
    cursor = self._collection.aggregate(pipeline)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/collection.py", line 2696, in aggregate
    return self._aggregate(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/_csot.py", line 108, in csot_wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/collection.py", line 2604, in _aggregate
    return self.__database.client._retryable_read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/mongo_client.py", line 1534, in _retryable_read
    return self._retry_internal(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/_csot.py", line 108, in csot_wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/mongo_client.py", line 1501, in _retry_internal
    ).run()
      ^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/mongo_client.py", line 2347, in run
    return self._read() if self._is_read else self._write()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/mongo_client.py", line 2485, in _read
    return self._func(self._session, self._server, conn, read_pref)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/aggregation.py", line 162, in get_cursor
    result = conn.command(
             ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/helpers.py", line 327, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/pool.py", line 985, in command
    return command(
           ^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pymongo/network.py", line 212, in command
    helpers._check_command_response(
  File "/usr/local/lib/python3.12/site-packages/pymongo/helpers.py", line 233, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Invalid type for vector, full error: {'ok': 0.0, 'operationTime': Timestamp(1729159273, 1), 'code': 9, 'errmsg': 'Invalid type for vector'}

Description

See "Example Code" part for what I'm doing, I expected the langchain version should also work, I've verified that the data, the vector search index and the embedding generated by bge-m3 are all right. Please suggest what else I need to check, thanks.

System Info

System Information

OS: Linux
OS Version: #1 SMP Tue Sep 10 22:02:55 UTC 2024
Python Version: 3.12.7 (main, Oct 1 2024, 22:28:49) [GCC 12.2.0]

Package Information

langchain_core: 0.2.41
langchain: 0.2.14
langchain_community: 0.2.12
langsmith: 0.1.135
langchain_chroma: 0.1.4
langchain_huggingface: 0.0.3
langchain_openai: 0.1.22
langchain_text_splitters: 0.2.4

Optional packages not installed

langgraph
langserve

Other Dependencies

aiohttp: 3.10.10
async-timeout: Installed. No version info available.
chromadb: 0.5.13
dataclasses-json: 0.6.7
fastapi: 0.115.2
httpx: 0.27.2
huggingface-hub: 0.25.2
jsonpatch: 1.33
numpy: 1.26.4
openai: 1.41.1
orjson: 3.10.7
packaging: 24.1
pydantic: 2.9.2
PyYAML: 6.0.2
requests: 2.32.3
requests-toolbelt: 1.0.0
sentence-transformers: 3.0.1
SQLAlchemy: 2.0.36
tenacity: 8.5.0
tiktoken: 0.7.0
tokenizers: 0.20.1
transformers: 4.45.2
typing-extensions: 4.12.2

@langcarl langcarl bot added the investigate label Oct 17, 2024
@dosubot dosubot bot added the Ɑ: vector store Related to vector store module label Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigate Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

1 participant