spring-projects · sobychacko · Sep 26, 2024 · Sep 27, 2024
diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs.adoc
@@ -91,6 +91,72 @@ It will not be initialized for you by default.
 You must opt-in, by passing a `boolean` for the appropriate constructor argument or, if using Spring Boot, setting the appropriate `initialize-schema` property to `true` in `application.properties` or `application.yml`.
 Check the documentation for the vector store you are using for the specific property name.
 
+== Batching Strategy
+
+When working with vector stores, it's often necessary to embed large numbers of documents.
+While it might seem straightforward to make a single call to embed all documents at once, this approach can lead to issues.
+Embedding models process text as tokens and have a maximum token limit, often referred to as the context window size.
+This limit restricts the amount of text that can be processed in a single embedding request.
+Attempting to embed too many tokens in one call can result in errors or truncated embeddings.
+
+To address this token limit, Spring AI implements a batching strategy.
+This approach breaks down large sets of documents into smaller batches that fit within the embedding model's maximum context window.
+Batching not only solves the token limit issue but can also lead to improved performance and more efficient use of API rate limits.
+
+Spring AI provides this functionality through the `BatchingStrategy` interface, which allows for processing documents in sub-batches based on their token counts.
+
+The core `BatchingStrategy` interface is defined as follows:
+
+[source,java]
+----
+public interface BatchingStrategy {
+    List<List<Document>> batch(List<Document> documents);
+}
+----
+
+This interface defines a single method, `batch`, which takes a list of documents and returns a list of document batches.
+
+=== Default Implementation: TokenCountBatchingStrategy
+
+Spring AI provides a default implementation called `TokenCountBatchingStrategy`.
+This strategy batches documents based on their token counts, ensuring that each batch does not exceed a calculated maximum input token count.
+
+Key features of `TokenCountBatchingStrategy`:
+
+1. Uses https://platform.openai.com/docs/guides/embeddings/embedding-models[OpenAI's max input token count] (8191) as the default upper limit.
+2. Incorporates a reserve percentage (default 10%) to provide a buffer for potential overhead.
+3. Calculates the actual max input token count as: `actualMaxInputTokenCount = originalMaxInputTokenCount * (1 - RESERVE_PERCENTAGE)`
+
+The strategy estimates the token count for each document, groups them into batches without exceeding the max input token count, and throws an exception if a single document exceeds this limit.
+
+=== Using the BatchingStrategy
+
+The `BatchingStrategy` is used internally by `EmbeddingModel` implementations to optimize the embedding process.
+It automatically batches documents when finding embeddings, which can lead to significant performance benefits, especially when dealing with large numbers of documents or APIs with token limitations.
+
+=== Customizing Batching Strategy
+
+While `TokenCountBatchingStrategy` provides a robust default implementation, you can customize the batching strategy to fit your specific needs.
+This can be done through Spring Boot's auto-configuration.
+
+To customize the batching strategy, define a `BatchingStrategy` bean in your Spring Boot application:
+
+[source,java]
+----
+@Configuration
+public class EmbeddingConfig {
+    @Bean
+    public BatchingStrategy customBatchingStrategy() {
+        return new CustomBatchingStrategy();
+    }
+}
+----
+
+This custom `BatchingStrategy` will then be automatically used by the `EmbeddingModel` implementations in your application.
+
+NOTE: Vector stores supported by Spring AI are configured to use the default `TokenCountBatchingStrategy`.
+SAP Hana vector store is not currently configured for batching.
+
 == Available Implementations
 
 These are the available implementations of the `VectorStore` interface: