Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chroma similarity_search and similarity_search_with_score do not return any results #27273

Open
5 tasks done
guninder opened this issue Oct 11, 2024 · 2 comments
Open
5 tasks done
Labels
investigate Ɑ: vector store Related to vector store module

Comments

@guninder
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Hi,
I am new to langchain and chroma. I am trying to insert data into chromadb and search it. There is no issue with data. I tried the same search in creating a knowledge base in bedrock. I don't get any error. The database created (data_level0.bin is about 6.3 MB) but while doing a search, it returns empty results. Following is the code to insert the data.

from langchain_chroma import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
import os

CHROMA_PATH = "data/chroma_wp"
os.environ["OPENAI_API_KEY"] = "sk-"

loader = TextLoader("books/war_and_peace.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200, separator="\n")

chunks = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
vectorStore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=CHROMA_PATH)```

Following is the code i am using to search.
```import chromadb
import os
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

os.environ["OPENAI_API_KEY"] = "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"
CHROMA_PATH = "data/chroma_wp"

embeddings = OpenAIEmbeddings()
vectorStore = Chroma(persist_directory=CHROMA_PATH, embedding_function=embeddings)
#vectorStore.delete()
print(vectorStore)

results = vectorStore.similarity_search("Who is Andrew?", k=3)
#vectorStore.similarity_search_with_score("Who is Andrew?", k=3)

print(results)

I get empty results.

Following are the packages i am using

langchain 0.3.1

langchain-chroma 0.1.4

langchain-community 0.3.1

langchain-core 0.3.6

langchain-experimental 0.3.2

langchain-openai 0.2.1

langchain-text-splitters 0.3.0

chroma-hnswlib 0.7.6

chromadb 0.5.12

Error Message and Stack Trace (if applicable)

No exception. Just empty results.

Description

Hi,
I am new to langchain and chroma. I am trying to insert data into chromadb and search it. There is no issue with data. I tried the same search in creating a knowledge base in bedrock. I don't get any error. The database created (data_level0.bin is about 6.3 MB) but while doing a search, it returns empty results. Following is the code to insert the data.

from langchain_chroma import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
import os

CHROMA_PATH = "data/chroma_wp"
os.environ["OPENAI_API_KEY"] = "sk-"

loader = TextLoader("books/war_and_peace.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200, separator="\n")

chunks = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
vectorStore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=CHROMA_PATH)```

Following is the code i am using to search.
```import chromadb
import os
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

os.environ["OPENAI_API_KEY"] = "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"
CHROMA_PATH = "data/chroma_wp"

embeddings = OpenAIEmbeddings()
vectorStore = Chroma(persist_directory=CHROMA_PATH, embedding_function=embeddings)
#vectorStore.delete()
print(vectorStore)

results = vectorStore.similarity_search("Who is Andrew?", k=3)
#vectorStore.similarity_search_with_score("Who is Andrew?", k=3)

print(results)

I get empty results.

Following are the packages i am using

langchain 0.3.1

langchain-chroma 0.1.4

langchain-community 0.3.1

langchain-core 0.3.6

langchain-experimental 0.3.2

langchain-openai 0.2.1

langchain-text-splitters 0.3.0

chroma-hnswlib 0.7.6

chromadb 0.5.12

System Info

System Information

OS: Windows
OS Version: 10.0.22631
Python Version: 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)]

Package Information

langchain_core: 0.3.6
langchain: 0.3.1
langchain_community: 0.3.1
langsmith: 0.1.129
langchain_chroma: 0.1.4
langchain_experimental: 0.3.2
langchain_openai: 0.2.1
langchain_text_splitters: 0.3.0

Optional packages not installed

langgraph
langserve

Other Dependencies

aiohttp: 3.10.6
async-timeout: 4.0.3
chromadb: 0.5.12
dataclasses-json: 0.6.7
fastapi: 0.115.0
httpx: 0.27.2
jsonpatch: 1.33
numpy: 1.26.4
openai: 1.50.1
orjson: 3.10.7
packaging: 24.1
pydantic: 2.9.2
pydantic-settings: 2.5.2
PyYAML: 6.0.2
requests: 2.32.3
SQLAlchemy: 2.0.35
tenacity: 8.5.0
tiktoken: 0.7.0
typing-extensions: 4.12.2

@langcarl langcarl bot added the investigate label Oct 11, 2024
@dosubot dosubot bot added the Ɑ: vector store Related to vector store module label Oct 11, 2024
@iharshlalakiya
Copy link

iharshlalakiya commented Oct 14, 2024

hi,
I can help you with this problem.

@guninder
Copy link
Author

iharshlalakiya, thank you. Will appreciate. Please let me know if you need any other information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigate Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

2 participants