Skip to content

Dev#39

Merged
zTgx merged 11 commits intomainfrom
dev
Apr 11, 2026
Merged

Dev#39
zTgx merged 11 commits intomainfrom
dev

Conversation

@zTgx
Copy link
Copy Markdown
Contributor

@zTgx zTgx commented Apr 11, 2026

No description provided.

zTgx added 11 commits April 10, 2026 21:40
The configuration documentation generation module was removed as it's
no longer needed in the codebase.

BREAKING CHANGE: The ConfigDocs module and related functionality has
been completely removed from the configuration system.
- Introduce QueryResultItem to represent individual query results
- Modify QueryResult to contain multiple QueryResultItem instances
- Update engine.query() to support single, multiple, or workspace scope queries
- Add QueryContext methods: with_doc_ids(), with_workspace(), resolve_scope()
- Refactor examples to use result.single() for accessing query results
- Update Python bindings to expose both QueryResult and QueryResultItem
- Enhance IndexContext to support multiple document sources and directories
- Add directory indexing functionality with from_dir() method
- Update documentation comments to reflect new multi-document capabilities
- Introduce FailedItem struct to track failed operations in batch processes
- Add has_failures() and failed() methods to IndexResult and QueryResult
- Implement incremental indexing with IndexMode (Default, Incremental, Force)
- Add check_skip_source() method to skip already indexed files based on mode
- Remove separate persist stage and integrate persistence into engine
- Add workspace.find_by_source_path() method for incremental indexing
- Update repr methods to show failure counts
- Include proper Python bindings for FailedItem class
… fingerprinting

- Add incremental indexing functionality that detects file content changes
  using content fingerprints and pipeline configuration changes using
  logic fingerprints
- Introduce three-layer change detection: file-level, logic-level, and
  node-level diffs for efficient updates
- Refactor Engine to use IndexAction enum for determining indexing behavior:
  Skip, FullIndex, or IncrementalUpdate based on change detection
- Add PipelineOptions with existing_tree support for incremental updates
- Implement reusable summaries computation for unchanged nodes during
  incremental updates
- Update indexer to handle both full indexing and incremental updates with
  existing tree parameter
- Add content and logic fingerprint storage in DocumentMeta for tracking
  changes between indexing sessions
… sources

Add concurrent indexing capabilities when multiple sources are present,
while maintaining single-source efficiency. Introduces process_source
helper method and improves error handling for indexing operations.

perf(pipeline): implement parallel stage execution for 2-stage groups

Enable true parallel execution of pipeline stages when safe (one
read-only stage like reasoning_index and one write stage). Add
handle_stage_result helper function and NopStage placeholder for
temporary stage swapping during parallel execution.

refactor(enhance): add concurrent LLM summary generation

Replace sequential LLM calls with concurrent processing using
buffer_unordered for better performance. Add PendingNode struct
and improve caching behavior with immediate application of cached
summaries before LLM processing.

feat(memoization): integrate memo store into summary generator

Automatically attach memo store to LlmSummaryGenerator and ensure
cached summaries are applied immediately during the enhancement stage.

BREAKING CHANGE: Pipeline execution now supports true parallelism
for compatible stage pairs, changing the execution model from purely
sequential.
…lel indexing

BREAKING CHANGE: Replace PipelineExecutor with IndexerClient in EngineBuilder
and Engine components. This change enables true parallel document indexing
by creating fresh pipeline executors per operation, removing mutex contention.

feat(rust): add LLM-enabled indexer client with API key support

Create indexer client with LLM-enabled factory when API key is available,
allowing summary generation and reasoning capabilities during indexing.

refactor(index): implement AccessPattern for safe parallel stage execution

Replace hardcoded stage name checks with AccessPattern declarations to
determine safe parallel execution of pipeline stages, improving code
maintainability and correctness.

feat(index): enhance IndexAction with existing_id for document cleanup

Add existing_id field to FullIndex action variant to properly handle
old document cleanup after successful re-indexing, implementing
atomic save-then-remove pattern.

refactor(index): improve pipeline orchestrator stage execution logic

Enhance pipeline orchestrator to properly merge reader stage outputs
based on AccessPattern and correctly handle writer/reader relationships
between parallel stages.

refactor(index): add access patterns to enrich and reasoning stages

Declare proper access patterns for EnrichStage and ReasoningIndexStage
to enable correct parallel execution coordination by the orchestrator.
- Fix incremental indexing to properly track all types of node changes
 including modified, restructured, added and removed nodes instead of
  only tracking changed node IDs and removed titles
- Improve cycle detection algorithm in pipeline orchestrator by using
  index-based filtering instead of value-based filtering to avoid
  infinite loops

fix(config): handle missing OPENAI_API_KEY gracefully

- Replace unwrap() with proper error handling when reading OPENAI_API_KEY
  environment variable to prevent crashes when the variable is not set

refactor(metrics): correct enrich time metric field name

- Rename enhance_time_ms to enrich_time_ms for consistency with the
  actual functionality being measured

refactor(optimize): add access pattern implementation

- Add proper access pattern definition for optimize stage indicating
  it reads and writes tree data during execution
…changes

- Add processing_version field to PipelineOptions to track indexing algorithm versions
- Bump processing_version to 1 by default to trigger initial reprocessing
- Update incremental resolver to use pipeline processing version instead of stored doc version
- This ensures existing documents are reprocessed when indexing algorithms change
…ionships

- Add DocumentGraph and related types (DocumentGraphNode, GraphEdge, etc.)
- Create graph module with builder, config, and type definitions
- Move graph builder from src/index to src/graph module
- Implement automatic graph rebuilding after document indexing
- Add graph-aware retrieval with boost factor support
- Integrate document graph into query pipeline context
- Add graph configuration options to main Config struct
- Implement keyword extraction from ReasoningIndex for graph building
- Add workspace methods to persist/load document graph
- Update retrieval strategies to utilize graph connections
- Add validation for graph configuration parameters
- Add new graph example demonstrating cross-document relationship retrieval
- Implement get_graph() method to access document relationship graph
- Register graph example in Cargo.toml
- Expose DocumentGraph type in public API
- Demonstrate graph visualization with nodes, edges, and Jaccard similarity metrics
- Add IndexMetrics struct to track performance metrics including
  parse, build, enhance, enrich, and optimize stage durations
- Add metrics for LLM usage, token generation, and node processing
  statistics
- Integrate metrics collection into the indexing pipeline stages
- Move IndexMetrics from pipeline module to centralized metrics
  module for better organization
- Add metrics field to IndexedDocument and IndexItem structs
- Implement with_metrics methods to set indexing metrics during
  document processing
@zTgx zTgx merged commit 22bcaed into main Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant