Skip to content

Dev#60

Merged
zTgx merged 5 commits intomainfrom
dev
Apr 13, 2026
Merged

Dev#60
zTgx merged 5 commits intomainfrom
dev

Conversation

@zTgx
Copy link
Copy Markdown
Contributor

@zTgx zTgx commented Apr 13, 2026

No description provided.

zTgx added 5 commits April 13, 2026 18:41
- Replace join_all with stream::buffer_unordered(5) for page assignment
- Add bounded concurrency to TOC verification with buffer_unordered(5)
- Implement bounded concurrency for index repair with buffer_unordered(5)
- Use stream processing instead of collecting all futures at once
- Prevent rate limiting by limiting concurrent LLM requests

This change improves performance and reliability by preventing
excessive concurrent API calls to LLM services.

fix(structure-extractor): optimize hierarchical structure extraction

- Process first page group as initial structure, then remaining groups
  in parallel with bounded concurrency
- Add static version of continuation generation for parallel use
- Improve error handling for failed continuation groups
- Add proper entry deduplication and sorting logic
- Maintain shared context from initial entries for all continuations

The extraction now follows a phased approach: initial structure
generation followed by parallel continuation processing, which
improves both accuracy and performance.
Add Python binding for IndexMetrics to expose comprehensive indexing
pipeline metrics including timing information, LLM usage statistics,
and processing counts.

The new PyIndexMetrics class provides access to:
- Total indexing time and individual stage durations
- Node processing and summary generation counts
- LLM call statistics and token usage
- Topic and keyword indexing metrics
- Summary failure tracking

Also expose metrics through the PyIndexItem interface and register
the new class with the module.

feat(rust): track and expose indexing failure metrics

Enhance the IndexMetrics system to track and report on failed
summary generations during the indexing process. Add new
summaries_failed field and add_summaries_failed method to record
failures from LLM errors, rate limits, or other processing issues.

Update example code to display failure statistics and improve
error handling for missing LLM configuration.

refactor(rust): make metrics module public and update exports

Make the metrics module public to allow external access to metric
types and functionality.
…ent, error handling, and PDF indexing

Add four new example projects demonstrating core functionality:

- Batch Indexing Example: Shows indexing multiple documents using from_paths,
  from_dir, and from_bytes with cross-document querying capabilities

- Document Management Example: Demonstrates CRUD operations including list(),
  exists(), remove(), and clear() methods for indexed documents

- Error Handling Example: Illustrates proper VectorlessError exception
  handling with different error categories and inspection techniques

- PDF Indexing Example: Showcases PDF file indexing with detailed metrics
  inspection and querying capabilities

Each example includes dedicated README.md files with setup instructions,
environment variable documentation, and usage examples. All examples follow
consistent configuration patterns with proper async handling and cleanup
procedures.
…ysis

Add a new example demonstrating how to use IndexMetrics to inspect
detailed indexing pipeline metrics including timing breakdowns,
LLM usage statistics, and reasoning index performance.

The example includes:
- README with setup instructions and environment variables
- Main script comparing documents with/without summaries enabled
- Detailed metrics reporting for parse, build, and enhance stages
- LLM call statistics and token usage analysis
- Node processing and indexing success metrics

This helps users understand how different IndexOptions affect
pipeline performance and resource utilization.
Bump workspace package version from 0.1.25 to 0.1.26 in Cargo.toml
to prepare for new release.

chore(release): bump version from 0.1.4 to 0.1.5 in pyproject.toml

Bump python package version from 0.1.4 to 0.1.5 in pyproject.toml
to prepare for new release.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vectorless Ready Ready Preview, Comment Apr 13, 2026 0:06am

@zTgx zTgx merged commit b0aea5b into main Apr 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant