Skip to content

Add documentation for BatchingStrategy in vector stores #1424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,72 @@ It will not be initialized for you by default.
You must opt-in, by passing a `boolean` for the appropriate constructor argument or, if using Spring Boot, setting the appropriate `initialize-schema` property to `true` in `application.properties` or `application.yml`.
Check the documentation for the vector store you are using for the specific property name.

== Batching Strategy

When working with vector stores, it's often necessary to embed large numbers of documents.
While it might seem straightforward to make a single call to embed all documents at once, this approach can lead to issues.
Embedding models process text as tokens and have a maximum token limit, often referred to as the context window size.
This limit restricts the amount of text that can be processed in a single embedding request.
Attempting to embed too many tokens in one call can result in errors or truncated embeddings.

To address this token limit, Spring AI implements a batching strategy.
This approach breaks down large sets of documents into smaller batches that fit within the embedding model's maximum context window.
Batching not only solves the token limit issue but can also lead to improved performance and more efficient use of API rate limits.

Spring AI provides this functionality through the `BatchingStrategy` interface, which allows for processing documents in sub-batches based on their token counts.

The core `BatchingStrategy` interface is defined as follows:

[source,java]
----
public interface BatchingStrategy {
List<List<Document>> batch(List<Document> documents);
}
----

This interface defines a single method, `batch`, which takes a list of documents and returns a list of document batches.

=== Default Implementation: TokenCountBatchingStrategy

Spring AI provides a default implementation called `TokenCountBatchingStrategy`.
This strategy batches documents based on their token counts, ensuring that each batch does not exceed a calculated maximum input token count.

Key features of `TokenCountBatchingStrategy`:

1. Uses https://platform.openai.com/docs/guides/embeddings/embedding-models[OpenAI's max input token count] (8191) as the default upper limit.
2. Incorporates a reserve percentage (default 10%) to provide a buffer for potential overhead.
3. Calculates the actual max input token count as: `actualMaxInputTokenCount = originalMaxInputTokenCount * (1 - RESERVE_PERCENTAGE)`

The strategy estimates the token count for each document, groups them into batches without exceeding the max input token count, and throws an exception if a single document exceeds this limit.

=== Using the BatchingStrategy

The `BatchingStrategy` is used internally by `EmbeddingModel` implementations to optimize the embedding process.
It automatically batches documents when finding embeddings, which can lead to significant performance benefits, especially when dealing with large numbers of documents or APIs with token limitations.

=== Customizing Batching Strategy

While `TokenCountBatchingStrategy` provides a robust default implementation, you can customize the batching strategy to fit your specific needs.
This can be done through Spring Boot's auto-configuration.

To customize the batching strategy, define a `BatchingStrategy` bean in your Spring Boot application:

[source,java]
----
@Configuration
public class EmbeddingConfig {
@Bean
public BatchingStrategy customBatchingStrategy() {
return new CustomBatchingStrategy();
}
}
----

This custom `BatchingStrategy` will then be automatically used by the `EmbeddingModel` implementations in your application.

NOTE: Vector stores supported by Spring AI are configured to use the default `TokenCountBatchingStrategy`.
SAP Hana vector store is not currently configured for batching.

== Available Implementations

These are the available implementations of the `VectorStore` interface:
Expand Down