Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[minor]: Add Upstash Embeddings Support #5150

Merged
merged 19 commits into from
May 25, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/core_docs/docs/integrations/vectorstores/upstash.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import CodeBlock from "@theme/CodeBlock";
import CreateClientExample from "@examples/indexes/vector_stores/upstash/create_client.ts";
import IndexQueryExample from "@examples/indexes/vector_stores/upstash/index_and_query_docs.ts";
import DeleteExample from "@examples/indexes/vector_stores/upstash/delete_docs.ts";
import UpstashEmbeddingsExample from "@examples/indexes/vector_stores/upstash/upstash_embeddings.ts";
import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";

# Upstash Vector
Expand Down Expand Up @@ -41,6 +42,12 @@ You can index the LangChain documents with any model of your choice, and perform

<CodeBlock language="typescript">{IndexQueryExample}</CodeBlock>

## Upstash embeddings

It's possible to use the embeddings service of Upstash, which is based on the embedding model of choice when creating the vector database. You don't need to create the embeddings manually, as the Upstash Vector service will handle this for you.

<CodeBlock language="typescript">{UpstashEmbeddingsExample}</CodeBlock>

## Delete Documents

You can also delete the documents you've indexed previously.
Expand Down
65 changes: 65 additions & 0 deletions examples/src/indexes/vector_stores/upstash/upstash_embeddings.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import { Index } from "@upstash/vector";
jacoblee93 marked this conversation as resolved.
Show resolved Hide resolved
import { OpenAIEmbeddings } from "@langchain/openai";
import { Document } from "@langchain/core/documents";
import { UpstashVectorStore } from "@langchain/community/vectorstores/upstash";

const index = new Index({
url: process.env.UPSTASH_VECTOR_REST_URL as string,
token: process.env.UPSTASH_VECTOR_REST_TOKEN as string,
});

// Initializing the UpstashVectorStore with the Upstash Embeddings configuration.
const UpstashVector = new UpstashVectorStore("UpstashEmbeddings", { index });

// Creating the docs to be indexed.
const id = new Date().getTime();
const documents = [
new Document({
metadata: { name: id },
pageContent: "Hello there!",
}),
new Document({
metadata: { name: id },
pageContent: "What are you building?",
}),
new Document({
metadata: { time: id },
pageContent: "Upstash Vector is great for building AI applications.",
}),
new Document({
metadata: { time: id },
pageContent: "To be, or not to be, that is the question.",
}),
];

// Creating embeddings from the provided documents, and adding them to Upstash database.
await UpstashVector.addDocuments(documents);

// Waiting vectors to be indexed in the vector store.
// eslint-disable-next-line no-promise-executor-return
await new Promise((resolve) => setTimeout(resolve, 1000));

const queryResult = await UpstashVector.similaritySearchWithScore(
"Vector database",
2
);

console.log(queryResult);
/**
[
[
Document {
pageContent: 'Upstash Vector is great for building AI applications.',
metadata: [Object]
},
0.9016147
],
[
Document {
pageContent: 'What are you building?',
metadata: [Object]
},
0.8613077
]
]
*/
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import { EmbeddingsInterface } from "@langchain/core/embeddings";
import { UpstashVectorStore } from "../upstash.js";
import { sleep } from "../../utils/time.js";

describe.skip("UpstashVectorStore", () => {
describe("UpstashVectorStore", () => {
let store: UpstashVectorStore;
let embeddings: EmbeddingsInterface;
let index: Index;
Expand All @@ -17,8 +17,10 @@ describe.skip("UpstashVectorStore", () => {
token: process.env.UPSTASH_VECTOR_REST_TOKEN,
});

await index.reset();

embeddings = new SyntheticEmbeddings({
vectorSize: 1536,
vectorSize: 384,
});

store = new UpstashVectorStore(embeddings, {
Expand Down Expand Up @@ -119,4 +121,52 @@ describe.skip("UpstashVectorStore", () => {

expect(results2).toHaveLength(0);
});

test("UpstashVectorStore with Upstash Embedding configuration, the embeddings will be created by Upstash's service", async () => {
const vectorStoreWithUpstashEmbeddings = new UpstashVectorStore(
"UpstashEmbeddings",
{ index }
);

const createdAt = new Date().getTime();

const ids = await vectorStoreWithUpstashEmbeddings.addDocuments([
{ pageContent: "hello", metadata: { a: createdAt + 1 } },
{ pageContent: "car", metadata: { a: createdAt } },
{ pageContent: "adjective", metadata: { a: createdAt } },
{ pageContent: "hi", metadata: { a: createdAt } },
]);

// Sleeping for a second to make sure that all the indexing operations are finished.
await sleep(1000);

const results1 =
await vectorStoreWithUpstashEmbeddings.similaritySearchVectorWithScore(
"hello!",
1
);
expect(results1).toHaveLength(1);

expect([results1[0][0]]).toEqual([
new Document({ metadata: { a: createdAt + 1 }, pageContent: "hello" }),
]);

const results2 =
await vectorStoreWithUpstashEmbeddings.similaritySearchVectorWithScore(
"testing!",
6
);

expect(results2).toHaveLength(4);

await vectorStoreWithUpstashEmbeddings.delete({ ids: ids.slice(2) });

const results3 =
await vectorStoreWithUpstashEmbeddings.similaritySearchVectorWithScore(
"testing again!",
6
);

expect(results3).toHaveLength(2);
});
});
112 changes: 90 additions & 22 deletions libs/langchain-community/src/vectorstores/upstash.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
import * as uuid from "uuid";
import { EmbeddingsInterface } from "@langchain/core/embeddings";
import { VectorStore } from "@langchain/core/vectorstores";
import { Index as UpstashIndex } from "@upstash/vector";
import { Index as UpstashIndex, type QueryResult } from "@upstash/vector";
import { Document, DocumentInterface } from "@langchain/core/documents";
import { chunkArray } from "@langchain/core/utils/chunk_array";
import { FakeEmbeddings } from "@langchain/core/utils/testing";
import {
AsyncCaller,
AsyncCallerParams,
Expand Down Expand Up @@ -31,12 +32,15 @@ export type UpstashQueryMetadata = UpstashMetadata & {
*/
export type UpstashDeleteParams =
| {
ids: string | string[];
deleteAll?: never;
}
ids: string | string[];
deleteAll?: never;
}
| { deleteAll: boolean; ids?: never };

const CONCURRENT_UPSERT_LIMIT = 1000;



/**
* The main class that extends the 'VectorStore' class. It provides
* methods for interacting with Upstash index, such as adding documents,
Expand All @@ -45,22 +49,31 @@ const CONCURRENT_UPSERT_LIMIT = 1000;
export class UpstashVectorStore extends VectorStore {
declare FilterType: string;

declare embeddings: EmbeddingsInterface;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should make this an optional prop


index: UpstashIndex;

caller: AsyncCaller;

embeddings: EmbeddingsInterface;
upstashEmbeddingsConfig?: boolean;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming to useUpstashEmbeddings instead


filter?: this["FilterType"];

_vectorstoreType(): string {
return "upstash";
}

constructor(embeddings: EmbeddingsInterface, args: UpstashVectorLibArgs) {
super(embeddings, args);

this.embeddings = embeddings;
constructor(
embeddings: EmbeddingsInterface | "UpstashEmbeddings",
args: UpstashVectorLibArgs
) {
if (embeddings === "UpstashEmbeddings") {
super(new FakeEmbeddings(), args);
this.upstashEmbeddingsConfig = true;
} else {
super(embeddings, args);
this.embeddings = embeddings;
}

const { index, ...asyncCallerArgs } = args;

Expand All @@ -78,10 +91,14 @@ export class UpstashVectorStore extends VectorStore {
*/
async addDocuments(
documents: DocumentInterface[],
options?: { ids?: string[] }
options?: { ids?: string[]; UpstashEmbeddings?: boolean }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small style nit - we generally use camel case for properties. I can push a rename

) {
const texts = documents.map(({ pageContent }) => pageContent);

if (this.upstashEmbeddingsConfig || options?.UpstashEmbeddings) {
return this.addData(documents, options);
}

const embeddings = await this.embeddings.embedDocuments(texts);

return this.addVectors(embeddings, documents, options);
Expand Down Expand Up @@ -128,6 +145,45 @@ export class UpstashVectorStore extends VectorStore {
return documentIds;
}

/**
* This method adds the provided documents to Upstash database. The pageContent of the documents will be embedded by Upstash Embeddings.
* @param documents Array of Document objects to be added to the Upstash database.
* @param options Optional object containing the array of ids for the documents.
* @returns Promise that resolves with the ids of the provided documents when the upsert operation is done.
*/
async addData(documents: DocumentInterface[], options?: { ids?: string[] }) {
Copy link
Collaborator

@jacoblee93 jacoblee93 May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to make this protected since I think redundant vs. just calling addDocuments

const documentIds =
options?.ids ?? Array.from({ length: documents.length }, () => uuid.v4());

const upstashVectorsWithData = documents.map((document, index) => {
const metadata = {
_pageContentLC: documents[index].pageContent,
...documents[index].metadata,
};

const id = documentIds[index];

return {
id,
data: document.pageContent,
metadata,
};
});

const vectorChunks = chunkArray(
upstashVectorsWithData,
CONCURRENT_UPSERT_LIMIT
);

const batchRequests = vectorChunks.map((chunk) =>
this.caller.call(async () => this.index.upsert(chunk))
);

await Promise.all(batchRequests);

return documentIds;
}

/**
* This method deletes documents from the Upstash database. You can either
* provide the target ids, or delete all vectors in the database.
Expand All @@ -143,18 +199,30 @@ export class UpstashVectorStore extends VectorStore {
}

protected async _runUpstashQuery(
query: number[],
query: number[] | string,
k: number,
filter?: this["FilterType"],
options?: { includeVectors: boolean }
) {
const queryResult = await this.index.query<UpstashQueryMetadata>({
vector: query,
topK: k,
includeMetadata: true,
filter,
...options,
});
let queryResult: QueryResult<UpstashQueryMetadata>[] = [];

if (typeof query === "string") {
queryResult = await this.index.query<UpstashQueryMetadata>({
data: query,
topK: k,
includeMetadata: true,
filter,
...options,
});
} else {
queryResult = await this.index.query<UpstashQueryMetadata>({
vector: query,
topK: k,
includeMetadata: true,
filter,
...options,
});
}

return queryResult;
}
Expand All @@ -169,7 +237,7 @@ export class UpstashVectorStore extends VectorStore {
* maximum of 'k' and vectors in the index.
*/
async similaritySearchVectorWithScore(
query: number[],
query: number[] | string,
k: number,
filter?: this["FilterType"]
): Promise<[DocumentInterface, number][]> {
Expand Down Expand Up @@ -203,7 +271,7 @@ export class UpstashVectorStore extends VectorStore {
static async fromTexts(
texts: string[],
metadatas: UpstashMetadata | UpstashMetadata[],
embeddings: EmbeddingsInterface,
embeddings: EmbeddingsInterface | "UpstashEmbeddings",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I mean instead of passing a string here, pass FakeEmbeddings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, passing something explicit like "UpstashEmbeddings" makes it easier to understand you have the option to either get the embeddings from Upstash, or passing a embedding client to create the embeddings your self. In the Vectara case, it seems that there's no option to pass an embedding, so all the embeddings will be created directly by it self. With Upstash, this is optional. Sorry for the delay here.

dbConfig: UpstashVectorLibArgs
): Promise<UpstashVectorStore> {
const docs: DocumentInterface[] = [];
Expand All @@ -229,7 +297,7 @@ export class UpstashVectorStore extends VectorStore {
*/
static async fromDocuments(
docs: DocumentInterface[],
embeddings: EmbeddingsInterface,
embeddings: EmbeddingsInterface | "UpstashEmbeddings",
dbConfig: UpstashVectorLibArgs
): Promise<UpstashVectorStore> {
const instance = new this(embeddings, dbConfig);
Expand All @@ -244,7 +312,7 @@ export class UpstashVectorStore extends VectorStore {
* @returns
*/
static async fromExistingIndex(
embeddings: EmbeddingsInterface,
embeddings: EmbeddingsInterface | "UpstashEmbeddings",
dbConfig: UpstashVectorLibArgs
): Promise<UpstashVectorStore> {
const instance = new this(embeddings, dbConfig);
Expand Down