Conversation
- Replace join_all with stream::buffer_unordered(5) for page assignment - Add bounded concurrency to TOC verification with buffer_unordered(5) - Implement bounded concurrency for index repair with buffer_unordered(5) - Use stream processing instead of collecting all futures at once - Prevent rate limiting by limiting concurrent LLM requests This change improves performance and reliability by preventing excessive concurrent API calls to LLM services. fix(structure-extractor): optimize hierarchical structure extraction - Process first page group as initial structure, then remaining groups in parallel with bounded concurrency - Add static version of continuation generation for parallel use - Improve error handling for failed continuation groups - Add proper entry deduplication and sorting logic - Maintain shared context from initial entries for all continuations The extraction now follows a phased approach: initial structure generation followed by parallel continuation processing, which improves both accuracy and performance.
Add Python binding for IndexMetrics to expose comprehensive indexing pipeline metrics including timing information, LLM usage statistics, and processing counts. The new PyIndexMetrics class provides access to: - Total indexing time and individual stage durations - Node processing and summary generation counts - LLM call statistics and token usage - Topic and keyword indexing metrics - Summary failure tracking Also expose metrics through the PyIndexItem interface and register the new class with the module. feat(rust): track and expose indexing failure metrics Enhance the IndexMetrics system to track and report on failed summary generations during the indexing process. Add new summaries_failed field and add_summaries_failed method to record failures from LLM errors, rate limits, or other processing issues. Update example code to display failure statistics and improve error handling for missing LLM configuration. refactor(rust): make metrics module public and update exports Make the metrics module public to allow external access to metric types and functionality.
…ent, error handling, and PDF indexing Add four new example projects demonstrating core functionality: - Batch Indexing Example: Shows indexing multiple documents using from_paths, from_dir, and from_bytes with cross-document querying capabilities - Document Management Example: Demonstrates CRUD operations including list(), exists(), remove(), and clear() methods for indexed documents - Error Handling Example: Illustrates proper VectorlessError exception handling with different error categories and inspection techniques - PDF Indexing Example: Showcases PDF file indexing with detailed metrics inspection and querying capabilities Each example includes dedicated README.md files with setup instructions, environment variable documentation, and usage examples. All examples follow consistent configuration patterns with proper async handling and cleanup procedures.
…ysis Add a new example demonstrating how to use IndexMetrics to inspect detailed indexing pipeline metrics including timing breakdowns, LLM usage statistics, and reasoning index performance. The example includes: - README with setup instructions and environment variables - Main script comparing documents with/without summaries enabled - Detailed metrics reporting for parse, build, and enhance stages - LLM call statistics and token usage analysis - Node processing and indexing success metrics This helps users understand how different IndexOptions affect pipeline performance and resource utilization.
Bump workspace package version from 0.1.25 to 0.1.26 in Cargo.toml to prepare for new release. chore(release): bump version from 0.1.4 to 0.1.5 in pyproject.toml Bump python package version from 0.1.4 to 0.1.5 in pyproject.toml to prepare for new release.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.