Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing ID in VectorStores #7013

Open
5 tasks done
KevinZJN opened this issue Oct 17, 2024 · 2 comments
Open
5 tasks done

Missing ID in VectorStores #7013

KevinZJN opened this issue Oct 17, 2024 · 2 comments
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@KevinZJN
Copy link
Contributor

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Example code:

import { Chroma } from "@langchain/community/vectorstores/chroma";
import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

const vectorStore = new Chroma(embeddings, {
  collectionName: "a-test-collection",
  collectionMetadata: {
    "hnsw:space": "cosine",
  },
});

const document1 = {
  pageContent: "The powerhouse of the cell is the mitochondria",
  metadata: { source: "https://example.com" },
};

const documents = [document1];
await vectorStore.addDocuments(documents, { ids: ["1"] });
const similaritySearchResults = await vectorStore.similaritySearch("pwoerhouse of the cell");

console.log(similaritySearchResults);

Error Message and Stack Trace (if applicable)

[
  Document {
    pageContent: 'The powerhouse of the cell is the mitochondria',
    metadata: { source: 'https://example.com' },
    id: undefined
  }
]

Description

The same issue as shown in Python.
There is already pull request here that fixes one of the vector. But the issue occurs in other vectorstores.

Example code fix in Chroma db
libs/langchain-community/src/vectorstores/chroma.ts line:351:

            ...
            results.push([
                new Document({
                    pageContent: firstDocuments?.[i] ?? "",
                    metadata,
                }),
                firstDistances[i],
            ]);

to

            results.push([
                new Document({
                    pageContent: firstDocuments?.[i] ?? "",
                    metadata,
                    id: firstIds[i],
                }),
                firstDistances[i],
            ]);

If it's possible we would like to perform same solution to other vectorstores.

System Info

dependencies:
@langchain/openai: >=0.1.0 <0.4.0        jsonpointer: ^5.0.1                      uuid: ^10.0.0                            
@langchain/textsplitters: >=0.0.0 <0.2.0 langsmith: ^0.1.56-rc.1                  yaml: ^2.2.1                             
js-tiktoken: ^1.0.12                     openapi-types: ^12.1.3                   zod-to-json-schema: ^3.22.3              
js-yaml: ^4.1.0                          p-retry: 4                               zod: ^3.22.4    
Copy link

dosubot bot commented Oct 17, 2024

Hey @KevinZJN! I'm here to help you with any bugs, questions, or contributions. Let's tackle this together!

I found a similar issue that was addressed in the LangChain.js repository. The issue titled "MemoryVectorStore does not support Document id property" was closed with a pull request that added support for storing and retrieving the Document ID field in the MemoryVectorStore [1].

For the Chroma vector store, the addVectors method already generates UUIDs for document IDs if they are not provided, ensuring that IDs are assigned to documents [2]. The fix you mentioned for the Chroma vector store involves modifying the code to include the id when creating a new Document instance during the similarity search. This ensures that the document ID is retained and accessible in the search results.

To apply a similar solution to other vector stores, you would need to ensure that the document ID is included when creating Document instances in their respective methods for adding documents and performing similarity searches. This might involve modifying the code to handle IDs similarly to how it's done in the Chroma vector store.

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Oct 17, 2024
@ausboss
Copy link

ausboss commented Oct 30, 2024

I've been scratching my head trying to figure out why documents all said id: undefined despite following the guide. I need it to be able to delete the document. I hope this gets fixed soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants