Add tree cache to ASTAnalyzer and optimize vector projection #196

m1rl0k · 2026-01-24T18:52:48Z

Introduces a tree cache to ASTAnalyzer to avoid re-parsing unchanged files, improving performance for repeated analysis. Enhances vector projection in scripts/ingest/vectors.py by using numpy when available for significant speedup, with a fallback to pure Python. Updates Qdrant upsert logic to support async/sync modes via parameter or environment variable, and adds a flush_upserts utility to ensure data persistence after async upserts.

augmentcode · 2026-01-24T18:57:00Z

🤖 Augment PR Summary

Summary: Performance-focused update to reuse parsed ASTs, speed up mini-vector projection, and optionally run Qdrant upserts asynchronously.

Changes:

Added an optional tree cache to ASTAnalyzer to avoid re-parsing unchanged files.
Routed mapping-based tree-sitter analysis through a new _parse_with_cache helper and exposed cache stats.
Optimized project_mini random projection with a NumPy fast-path and a pure-Python fallback.
Extended Qdrant upsert_points to support sync/async behavior via a wait parameter and INDEX_UPSERT_ASYNC.
Added flush_upserts utility intended to help ensure persistence after async upserts.

Technical Notes: Default behavior remains synchronous unless async mode is enabled; projection matrices are cached and NumPy uses vectorized matmul + L2 normalization.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-01-24T18:57:01Z

scripts/ast_analyzer.py

+        path = Path(file_path) if file_path else None
+
+        # Try to get cached tree (only for real files, not in-memory content)
+        if self._tree_cache and path and path.exists():


_parse_with_cache can return a cached tree even when analyze_file(..., content=...) is called with explicit content for an on-disk file_path, which risks analyzing a stale tree if the provided content differs from the file on disk.

_{🤖 Was this useful? React with 👍 or 👎}

augmentcode · 2026-01-24T18:57:01Z

scripts/ingest/qdrant.py

+        return
+    try:
+        # Force a sync operation to ensure all pending writes are flushed
+        client.get_collection(collection)


flush_upserts() calls get_collection(), but Qdrant’s wait=False semantics are “confirmed received” rather than “applied”, so this may not actually guarantee prior async upserts are committed before reads.

_{🤖 Was this useful? React with 👍 or 👎}

augmentcode · 2026-01-24T18:57:01Z

scripts/ingest/vectors.py

+                [scale * (1.0 if rnd.random() < 0.5 else -1.0) for _ in range(out_dim)]
+                for _ in range(in_dim)
+            ]
+            M = np.array(M_list, dtype=np.float32)


The NumPy path uses float32 for both the projection matrix and input vector, so results will differ from the pure-Python (float64) path despite the “reproducibility” comment; if determinism across environments matters, this could be surprising.

_{🤖 Was this useful? React with 👍 or 👎}

Refines ASTAnalyzer to avoid returning stale trees when content is provided in-memory, by tracking content provenance and skipping cache in such cases. Enhances flush_upserts in Qdrant ingest to clarify consistency semantics and add a minimal scroll for better write visibility. Adds comprehensive tests for chunk deduplication, ingest infrastructure, and tree cache integration to ensure correctness and robustness.

augmentcode bot reviewed Jan 24, 2026

View reviewed changes

m1rl0k merged commit 85de0dd into test Jan 24, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tree cache to ASTAnalyzer and optimize vector projection #196

Add tree cache to ASTAnalyzer and optimize vector projection #196

Uh oh!

m1rl0k commented Jan 24, 2026

Uh oh!

augmentcode bot commented Jan 24, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

augmentcode bot Jan 24, 2026

Uh oh!

augmentcode bot Jan 24, 2026

Uh oh!

augmentcode bot Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add tree cache to ASTAnalyzer and optimize vector projection #196

Add tree cache to ASTAnalyzer and optimize vector projection #196

Uh oh!

Conversation

m1rl0k commented Jan 24, 2026

Uh oh!

augmentcode bot commented Jan 24, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants