Retrieve Similarity Score for a Specific Document #30141

XariZaru · 2025-03-06T17:08:52Z

XariZaru
Mar 6, 2025

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain.retrievers import ParentDocumentRetriever
from langchain_community.vectorstores import Pinecone

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model="text-embedding-3-small")
vectorstore = Pinecone.from_existing_index(index_name, embedding=embeddings)

parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
    child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
    embedding_function=embeddings)

fs = LocalFileStore(dir)
store = create_kv_docstore(fs)
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,#vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter
)

Description

So I recently uploaded some documents containing text that I was very certain would be picked up in a similarity search. I even prompted in my query words directly from the document. The issue is that the similarity search returns other documents (which I admit have similar content as well).

I have a sneaking suspicion something is wrong on my end but I can't really determine what's up. I've been adding documents to my vector store and processing them for months without issues, so this is quite strange.

I can't provide the document since it contains sensitive information, so that's why I was curious if there was some way to essentially do the following:

Specifically retrieve a document from vector store (Pinecone allows metadata filtering)
Determine its similarity to one another.

I have done a similarity comparison directly between the two strings. The score is pretty good, and yet the other documents with lower scores are being returned instead.

System Info

python 3.11.1
langchain_openai 0.3.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieve Similarity Score for a Specific Document #30141

{{title}}

Replies: 0 comments

Select a reply

Retrieve Similarity Score for a Specific Document #30141

XariZaru Mar 6, 2025

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 0 comments

XariZaru
Mar 6, 2025