Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[patch]: VoyageAI embedding with input_type parameter #5493

Merged
merged 15 commits into from
May 21, 2024
10 changes: 9 additions & 1 deletion docs/core_docs/docs/integrations/text_embedding/voyageai.mdx
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
# Voyage AI
## Voyage AI Integration
nicolas-geysse marked this conversation as resolved.
Show resolved Hide resolved

The `VoyageEmbeddings` class uses the Voyage AI REST API to generate embeddings for a given text.

The `inputType` parameter allows you to specify the type of input text for better embedding results. You can set it to `query`, `document`, or leave it undefined (which is equivalent to `None`).

- `query`: Use this for search or retrieval queries. Voyage AI will prepend a prompt to optimize the embeddings for query use cases.
- `document`: Use this for documents or content that you want to be retrievable. Voyage AI will prepend a prompt to optimize the embeddings for document use cases.
- `None` (default): The input text will be directly encoded without any additional prompt.


```typescript
import { VoyageEmbeddings } from "langchain/embeddings/voyage";

const embeddings = new VoyageEmbeddings({
apiKey: "YOUR-API-KEY", // In Node.js defaults to process.env.VOYAGEAI_API_KEY
inputType: "None", // Optional: specify input type as 'query', 'document', or omit for None
Copy link
Collaborator

@jacoblee93 jacoblee93 May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None is weird here? Shouldn't this just be undefined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I read here :
https://docs.voyageai.com/docs/embeddings
"When the input_type is set to None, and the input text will be directly encoded by our embedding model. Alternatively, when the inputs are documents or queries, the users can specify input_type to be query or document, respectively. "
So sounds good to me ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think they mean None in the Python sense:

https://www.w3schools.com/python/ref_keyword_none.asp

In JS, this is the same as undefined

});
```
19 changes: 14 additions & 5 deletions libs/langchain-community/src/embeddings/tests/voyage.int.test.ts
Original file line number Diff line number Diff line change
@@ -1,25 +1,26 @@
import { test, expect } from "@jest/globals";
import { VoyageEmbeddings } from "../voyage.js";

test.skip("Test VoyageEmbeddings.embedQuery", async () => {
const embeddings = new VoyageEmbeddings();
test.skip("Test VoyageEmbeddings.embedQuery with input_type", async () => {
const embeddings = new VoyageEmbeddings({ inputType: "None" });
const res = await embeddings.embedQuery("Hello world");

expect(typeof res[0]).toBe("number");
});

test.skip("Test VoyageEmbeddings.embedDocuments", async () => {
const embeddings = new VoyageEmbeddings();
test.skip("Test VoyageEmbeddings.embedDocuments with input_type", async () => {
const embeddings = new VoyageEmbeddings({ inputType: "None" });
const res = await embeddings.embedDocuments(["Hello world", "Bye bye"]);
expect(res).toHaveLength(2);
expect(typeof res[0][0]).toBe("number");
expect(typeof res[1][0]).toBe("number");
});

test.skip("Test VoyageEmbeddings concurrency", async () => {
test.skip("Test VoyageEmbeddings concurrency with input_type", async () => {
const embeddings = new VoyageEmbeddings({
batchSize: 1,
maxConcurrency: 2,
inputType: "None",
});
const res = await embeddings.embedDocuments([
"Hello world",
Expand All @@ -34,3 +35,11 @@ test.skip("Test VoyageEmbeddings concurrency", async () => {
undefined
);
});

test.skip("Test VoyageEmbeddings without input_type", async () => {
const embeddings = new VoyageEmbeddings();
const res = await embeddings.embedDocuments(["Hello world", "Bye bye"]);
expect(res).toHaveLength(2);
expect(typeof res[0][0]).toBe("number");
expect(typeof res[1][0]).toBe("number");
});
17 changes: 17 additions & 0 deletions libs/langchain-community/src/embeddings/voyage.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ export interface VoyageEmbeddingsParams extends EmbeddingsParams {
* limited by the Voyage AI API to a maximum of 8.
*/
batchSize?: number;

/**
* Input type for the embeddings request.
*/
inputType?: string;
}

/**
Expand All @@ -32,6 +37,14 @@ export interface CreateVoyageEmbeddingRequest {
* @memberof CreateVoyageEmbeddingRequest
*/
input: string | string[];

/**
* Input type for the embeddings request.
* @type {string}
* @memberof CreateVoyageEmbeddingRequest
*/

nicolas-geysse marked this conversation as resolved.
Show resolved Hide resolved
input_type?: string;
}

/**
Expand Down Expand Up @@ -61,6 +74,7 @@ export class VoyageEmbeddings
fields?: Partial<VoyageEmbeddingsParams> & {
verbose?: boolean;
apiKey?: string;
inputType?: string; // Make inputType optional
}
) {
const fieldsWithDefaults = { ...fields };
Expand All @@ -78,6 +92,7 @@ export class VoyageEmbeddings
this.batchSize = fieldsWithDefaults?.batchSize ?? this.batchSize;
this.apiKey = apiKey;
this.apiUrl = `${this.basePath}/embeddings`;
this.inputType = fieldsWithDefaults?.inputType;
}

/**
Expand All @@ -92,6 +107,7 @@ export class VoyageEmbeddings
this.embeddingWithRetry({
model: this.modelName,
input: batch,
input_type: this.inputType,
})
);

Expand Down Expand Up @@ -119,6 +135,7 @@ export class VoyageEmbeddings
const { data } = await this.embeddingWithRetry({
model: this.modelName,
input: text,
input_type: this.inputType,
});

return data[0].embedding;
Expand Down