Skip to content

GH-1199: Prevent timeouts with configurable batching for PgVectorStor… #1400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sobychacko
Copy link
Contributor

…e inserts

Resolves #1199

  • Implement configurable maxDocumentBatchSize to prevent insert timeouts when adding large numbers of documents
  • Update PgVectorStore to process document inserts in controlled batches
  • Add maxDocumentBatchSize property to PgVectorStoreProperties
  • Update PgVectorStoreAutoConfiguration to use the new batching property
  • Add tests to verify batching behavior and performance

This change addresses the issue of PgVectorStore inserts timing out due to large document volumes. By introducing configurable batching, users can now control the insert process to avoid timeouts while maintaining performance and reducing memory overhead for large-scale document additions.

…or PgVectorStore inserts

Resolves spring-projects#1199

- Implement configurable maxDocumentBatchSize to prevent insert timeouts
  when adding large numbers of documents
- Update PgVectorStore to process document inserts in controlled batches
- Add maxDocumentBatchSize property to PgVectorStoreProperties
- Update PgVectorStoreAutoConfiguration to use the new batching property
- Add tests to verify batching behavior and performance

This change addresses the issue of PgVectorStore inserts timing out due to
large document volumes. By introducing configurable batching, users can now
control the insert process to avoid timeouts while maintaining performance
and reducing memory overhead for large-scale document additions.
@markpollack
Copy link
Member

merged in 202148d

@markpollack markpollack added this to the 1.0.0-M3 milestone Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When too much data is imported, timeouts may easily occur when executing the embedding model.
2 participants